C# Pitfalls of returning IEnumerable

Generic and specific return types

It is often preferred to work with as generic types as possible. Doing that opens up for more flexibility and being able to modify parts of the code without effecting any other parts. One example of this is IEnumerable<T> which is implemented by several generic collection types, such as HashSet<T>, List<T>, and LinkedList<T>.

If you write a method that works with a List<T> internally and returns it as an IEnumerable<T> the caller does not need know that it is actually a List<T> you are using. And if you later decide to change your implementation to use a different type internally the caller does not need to be updated as long as the new type also implements IEnumerable<T>.

This might have you think that you should always return IEnumerable<T> when possible. But this might come back and bite you.

The multiple enumeration issue

Whenever you iterate over a collection via the IEnumerable interface it gets Enumerated. This means that an Enumerator object is created that flattens the collection so that members can be accessed sequentially. If this is done more than once you are causing extra work to be performed.

IEnumerable<string> names = GetNames();
foreach (var name in names)
{
  Console.WriteLine(name);
}
var sb = new StringBuilder();
foreach (var name in names)
{
  sb.Append(name).Append(" ");
}

The above code Enumerates the name enumerable twice. In the best case this just introduces some extra work. But it can get really nasty if GetNames returns a database Query. This will make the code evaluate the Query twice, possibly even returning different results each time.

Avoiding multiple enumerations

Fortunatelly it is quite easy to avoid multiple enumerations. The calling code can force the enumeration to happen once during variable initialization.

IList<string> names = GetNames().ToList();
...

With this change the enumeration will only happen once and the rest of the code can stay the same. The ToList method will allocate a new List and populate it with the strings returned from the Enumerator.

The price of being generic

Now, assume the GetNames method actually do use a List<string> internally and returns it as an IEnumerable<string> but then the caller calls the ToList method, isn’t that just adding extra complexity? Yes! That is exactly right. And not only that, the call to ToList will create a new List<string> which will be populated with the exact same elements that already exists in the original List. So there is an efficiency penalty involved as well.

So what to do? Well, that depends. If you are writing a library that is aimed for users outside of your team or organization, in other words where you don’t know how the returned type will be used and it is important that your code is flexible while the interface is stable, then you probably should return an IEnumerable<T>.

But if you are in control of all the source code involved, then you can afford to be more specific in what types you return.

Building strings in C# with good performance

A little bit on strings

Strings in C# are immutable. What that means is that you cannot modify an existing string, you must create a new one.

Also strings are objects – sequential read only collections of char objects – so they are allocated on the managed heap, and therefore managed by the Garbage Collector (GC). Hence, if you wish to modify an existing string you can use it as input and create a new string. For example, let’s assume you have a string ”Hello” that you wish to append your friends name to.

var str = "Hello";
...
str = string.Concat(str, " Bob");

Now what happens under the hood here is that String.Concat creates a new string, ”Hello Bob”, and the str variable is updated to reference the newly created string. Since the old string, ”Hello”, no longer is referenced it will be garbage collected.

Building strings in a loop

Now, assume that you wish to build up a long string by concatenating new words in a loop. We can simulate this behavior by creating a loop like the one below.

public static string BuildString()
{
  var str = string.Empty;
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    str = string.Concat(str, append);
  }
  return str;
}

The code looks decent. But, since strings are immutable, a new string object is created for each iteration in the loop. That is 100 000 objects that needs to be created and garbage collected.

Fortunately there is a better way to to this. If we could use a mutable object in the loop instead we could just keep adding to it avoiding all this object creation. The .NET Framework provides a type just for this purpose, System.Text.StringBuilder. With some small modifications to the code we can use this instead.

private static string BuildString()
{
  var sb = new System.Text.StringBuilder(100_000);
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    sb.Append(append);
  }
  return sb.ToString();
}        

The numbers

I did some measurements using the two different approaches above. In figure 1 below you can see the memory allocations.


Figure 1. Memory allocations

The ”Exclusive Allocations” column shows how many allocations of managed memory that was done in the methods. As expected in the method using String.Concat there were 100 000 allocations. Interestingly an additional 199 999 allocations are added by String.Concat.

In the method using StringBuilder there is only a single allocation, and only two additional are added by calls to other methods (one allocation in the StringBuilder constructor and a second one in the ToString method.

I also measured the time it took to execute the two different implementations. The String.Concat version took 1.3 seconds to run while the StringBuilder version took 1.0 milliseconds. That is 1 300 times faster.

Conclusion

Memory allocation and Garbage Collection are expensive operations. As can be seen in the case above it can be difficult to determine if a piece of code will perform good or bad without knowledge of the underlying implementation (in this case how strings are implemented in the .NET Framework). Many insights when it comes to performance comes with experience.

Also, remember that in almost all cases code readability is much more important than blazing fast performance. Do not try prioritize optimization over readability unless you know you need to. However, there are cases when the code can be both clear and perform well. Try to aim for that.

C# Collections – Non-generic or Generic?

Non-generic- vs generic collections

When coding in C# you will pretty soon encounter Collections (Lists, Queues, Stacks, and so on). As soon as you wish to use a Collection of some sort you will have the choice of using one from the System.Collections namespace or one from the System.Collections.Generic namespace. The most commonly used Collection is the List and the non-generic version of List is the ArrayList.

Using the non-generic version forces you to cast the objects into the correct type.

ArrayList l = new ArrayList { new MyType() };
...
// MyType t = l[0]; This won't compile
MyType t = (MyType)l[0]; // But this will
MyOtherType t2 = (MyOtherType)l[0]; // Unfortunately this will also compile

As can be seen in the code above the non-generic ArrayList is error prone. There is no compile time check whether you mistakenly cast to the correct type or not. The generic version of ArrayList is List<T>. Using this forces you to define the type of objects the list will contain already when creating the list, and the static code analysis tools and compiler will give you errors if you misuse the type.

var l = new List<MyType> { new MyType() };
...
MyType t = l[0]; // Now this compiles fine
MyType t2 = (MyType)l[0]; // This also works fine, but the cast is not needed
// MyOtherType t3 = (MyOtherType)l[0]; This won't compile

So, no unboxing and boxing, no need to help the compiler understand what types of objects the list holds, no runtime errors due to incorrect casts.

Now, List is only one of many Collection types in the System.Collections.Generic namespace, you will find generic versions of all the non-generic Collections. My advice is to simply pretend that the non-generic Collection types doesn’t even exist, and use the generic ones, always.