Building strings in C# with good performance

Building strings in C# with good performance

A little bit on strings

Strings in C# are immutable. What that means is that you cannot modify an existing string, you must create a new one.

Also strings are objects – sequential read only collections of char objects – so they are allocated on the managed heap, and therefore managed by the Garbage Collector (GC). Hence, if you wish to modify an existing string you can use it as input and create a new string. For example, let’s assume you have a string ”Hello” that you wish to append your friends name to.

var str = "Hello";
...
str = string.Concat(str, " Bob");

Now what happens under the hood here is that String.Concat creates a new string, ”Hello Bob”, and the str variable is updated to reference the newly created string. Since the old string, ”Hello”, no longer is referenced it will be garbage collected.

Building strings in a loop

Now, assume that you wish to build up a long string by concatenating new words in a loop. We can simulate this behavior by creating a loop like the one below.

public static string BuildString()
{
  var str = string.Empty;
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    str = string.Concat(str, append);
  }
  return str;
}

The code looks decent. But, since strings are immutable, a new string object is created for each iteration in the loop. That is 100 000 objects that needs to be created and garbage collected.

Fortunately there is a better way to to this. If we could use a mutable object in the loop instead we could just keep adding to it avoiding all this object creation. The .NET Framework provides a type just for this purpose, System.Text.StringBuilder. With some small modifications to the code we can use this instead.

private static string BuildString()
{
  var sb = new System.Text.StringBuilder(100_000);
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    sb.Append(append);
  }
  return sb.ToString();
}        

The numbers

I did some measurements using the two different approaches above. In figure 1 below you can see the memory allocations.


Figure 1. Memory allocations

The ”Exclusive Allocations” column shows how many allocations of managed memory that was done in the methods. As expected in the method using String.Concat there were 100 000 allocations. Interestingly an additional 199 999 allocations are added by String.Concat.

In the method using StringBuilder there is only a single allocation, and only two additional are added by calls to other methods (one allocation in the StringBuilder constructor and a second one in the ToString method.

I also measured the time it took to execute the two different implementations. The String.Concat version took 1.3 seconds to run while the StringBuilder version took 1.0 milliseconds. That is 1 300 times faster.

Conclusion

Memory allocation and Garbage Collection are expensive operations. As can be seen in the case above it can be difficult to determine if a piece of code will perform good or bad without knowledge of the underlying implementation (in this case how strings are implemented in the .NET Framework). Many insights when it comes to performance comes with experience.

Also, remember that in almost all cases code readability is much more important than blazing fast performance. Do not try prioritize optimization over readability unless you know you need to. However, there are cases when the code can be both clear and perform well. Try to aim for that.