C# struct and class – when to use each?

C# struct and class – when to use each?

Value types and Reference types

In C# you have both Value types and Reference types. There are some fundamental differences between them that are important to understand.

The most common value types are the simple ones, like int, bool, char, and long. But also enum and struct are value types. They are called value types due to the fact that variables that are based on any of those types directly contain the values.

The second group are Reference types. Variables based on these do not contain the values directly, instead they contain a reference to the where the objects can be found. Common Reference types are classes, delegates, interfaces, and strings.

Some key differences

  • With Value types, when you assign an existing variable to a new variable the actual value gets copied. With Reference types, a new reference to the existing object is created.
  • Value types lives on the stack. They seize to exist as soon as they go out of scope. Reference types lives on the heap. They seize to exist when they are collected by the Garbage Collector.
  • Value types are not polymorphic, what you see is what you have. Reference types can be polymorphic.

Differences between a class and a struct

It is not uncommon that you can switch between defining a class and a struct just by changing a single word in the code:

// A class definition
public class MyObject
{
  ...
}

// A struct definition
public struct MyObject
{
  ...
}

By doing so you change the type from a Reference type to a Value type, which can break a lot of existing code. For example:

public MyObject CreateGreen()
{
  var obj = new MyObject();
  SetGreenProperties(obj);
  return obj;
}

private static void SetGreenProperties(MyObject obj)
{
  obj.Color = "Green";
  ...
}

If MyObject is changed from a class to a struct the object returned from CreateGreen will no longer have the green properties set. Why? Because no longer is a reference to the object sent to SetGreenProperties. Instead a new copy, only local to the called method, is created. Note however that the code will still compile and run, but the application now contains a bug that might be hard to track down.

Structs also cannot be polymorphic, in other words you cannot define a generic base struct that other structs inherit.

public class Car
{
  ...
}

// This works just fine
public class Mustang : Car
{
 ...
}

public struct Animal
{
  ...
}

// But this does not compile
public struct Lion : Animal
{
 ...
}

However, both classes and structs can implement interfaces.

When to use a struct over a class?

It might seem that a struct is just a class with limitations. So why should you bother with structs? Why not just use class only?

The, maybe not so convincing answer, is that you should use classes to define the behavior of your application and structures for storing the data that your application manipulates. Structs are Value types, they are suited for storing values. I will elaborate on this idea a bit more to build my case.

Structs will come to their best use if you make them immutable. This can be done by setting all the values in the constructor and only supplying read only properties.

public struct Address
{
  public string Line1 { get; }
  public string Line2 { get; }
  public string City { get; }
  public string State { get; }
  public int Zip { get; }

  public Address(
    string line1, 
    string line2,
    string city,
    string state,
    int zip)
  {
    Line1 = line1;
    Line2 = line2;
    City = city;
    State = state;
    Zip = zip;
  }
}

Once all fields have been assigned you can be sure the struct as a whole is valid. There is no way to alter individual fields, leaving the struct in an invalid state. If you want to update the address you simply create a new Address with the updated information and replace the old one.

public void UpdateAddress(Address newAddress)
{
  this.address = newAddress;
}

If you know a bit on how value types are handled you might be concerned that there is a lot of unnecessary copying of values going on when you pass around structs instead of just references to objects. It is true that value types get copied when you pass them to other methods, and that is why you should keep your structs quite small. The copy operations are very efficient for value types  and creating a reference type instead would involve overhead for book keeping and garbage collection.

Here is a checklist for when a struct can be suitable to use:

  • It will be used for storing data
  • You can make it immutable
  • You can limit it’s public interface to get only properties
  • There is no need for it to have subclasses
  • It can be small

Pitfalls

Structs in multi threaded systems

It is important to understand that when you assign a struct variable to another, each field gets copied, and this may not be an atomic operation. In a multi threaded system where you have a class or a static variable holding a struct and it is accessed and updated by different threads, you will need to add locks around the read and assignment operations. Even though the struct itself is immutable.

public class Person
{
  // public Address Address { get; set; } // Not thread safe

  // Somewhat thread safe, depending on usage.
  private readonly object lockObj = new object();
  private Address _address;
  public Address Address
  {
    get
    {
      lock (lockObj)
      {
        return _address;
      }
    }
    set
    {
      lock (lockObj)
      {
        _address = value;
      }
    }
  }
  // Bad usage:
  //   Console.WriteLine(person.Address.Line1);
  //   Console.WriteLine(person.Address.Line2); // Address might have been updated between these accesses
  // Good usage:
  //   var address = person.Address; // Make a local copy of the address struct
  //   Console.WriteLine(address.Line1);
  //   Console.WriteLine(address.Line2);
  ...
}

An even better alternative might be to avoid adding locks in the Person class and instead add locks in the code using it.

var p = new Person(address);
...
var newAddress = new Address(...);
lock (p)
{
  p.Address = newAddress;
}

Struct members being mutable types

Part of being immutable is also securing that your struct does not have properties that return references to mutable types.

public struct Unsecure
{
  private readonly int[] _values;
  public IEnumerable<int> Values => _values;

  public Unsecure(int[] values)
  {
    _values = values;
  }
}

var values = new [] { 1, 2, 3 };
var unsecure = new Unsecure(values);
...
values[0] = 4; // Modifies the internals of the 'unsecure' object.

You can remedy this by using ImmutableArray<T> or ImmutableList<T> from the System.Collections.Immutable namespace that I have written about in previous posts.

public struct Secure
{
  private readonly ImmutableArray<int> _values;
  public IEnumerable<int> Values => _values;

  public Secure(int[] values)
  {
    _values = values.ToImmutableArray();
  }
}

var values = new [] { 1, 2, 3 };
var secure = new Secure(values);
...
values[0] = 4; // Now this does not modify the internals of 'secure'.

Conclusion

Now you hopefully have a pretty good understanding for when structs can be a good option, and how to avoid the pitfalls involved. Most of the time you will use classes, but there are some occasions where structs, are a better fit. You should learn to identify those situations and be able to reason about why you choose one over the other.

Finally, if in doubt, use a class.

C# Protecting private collections from modification – Part 2

The common object issue

In the last part I wrote about how you can protect a class’s private collections from modification by wrapping them in a ReadOnlyCollection before exposing them. But how about collections that are handed to your class? Consider the following code:

public class MyClass
{
  private readonly List<int> _myInts;

  public MyClass(List<int> ints)
  {
    _myInts = ints;
  }
  ...
}

What happens here is that MyClass takes a variable of reference type in it’s constructor, and creates a variable of reference type of its own, _myInts, that is set to reference the same List object as the constructor variable. In other words, the List of ints on the heap now has at least two references to it.

Now assume the code that instantiates MyClass looks something like this:

var ints = new List<int> { 1, 2, 3 };
var myClass = new MyClass(ints); // Use ints in the MyClass constructor
...
ints.Clear(); // Re-use ints for other purposes
ints.Add(7);
...

This code will clear and then add a 7 to the List that _myInts in MyClass references. Which may lead to unexpected behavior.

Protecting the data

Fortunately there is an easy way to protect the data from any modifications. By using a type from the Systems.Collections.Immutable namespace.

The Immutable namespace contains immutable classes and interfaces, as well as extension methods to create an immutable copy of a mutable collection. The MyClass code can be changed to take advantage of these types:

public class MyClass
{
  private readonly ImmutableList<int> _ints;

  public MyClass(List<int> ints)
  {
    _ints = ints.ToImmutableList();
  }
  ...
}

Note that it would add no protection from external modification by using AsReadOnly like in the previous post. The big difference here is that ToImmutableList creates a new ImmutableList object which is populated by enumerating the ints List and copying the values.

Conclusion

When dealing with references, it is easy to forget to properly protect data that should be private to a class. There are also no compile time warnings or errors for altering a referenced object that have many references to it. However, it is important to be able to trust that data that you don’t expect to change really stays the same. Use the support that is available in the framework in order to achieve this.

C# Protecting private collections from modification

Why do we want to hide data?

One of the big advantages of object oriented programming is claimed to be possibilities to encapsulate data and hide internal implementation details. This enables us to make changes to our implementation without impacting our users and ensures our internal data does not get modified in ways that we do not control. In other words, it helps us keep sane.

Protecting data in a class

In C# fields in a class are protected from outside access by using the private keyword.

public class MyClass
{
  private List<int> _myInts;
  ...
}

Now _myInts is only accessible from within MyClass.

Breaking encapsulation

Now, you might want users of MyClass to be able to read the data that _myInts contain. The easiest way to do this is to make _myInts public.

public class MyClass
{
  public List<int> _myInts;
  ...
}

But now you no longer have control over what happens to _myInts. Any external user might alter the list or even change the reference to point at a completely different list object (or null).

Re-adding the protection

In an attempt to prevent a user from modifying the private List you might add a getter method, let’s say by creating a read only property.

public class MyClass
{
  private List<int> _myInts;
  public List<int> MyInts => _myInts;
  ...
}

This code can be simplified by converting to an auto property.

public class MyClass
{
  public List<int> MyInts { get; } = new List<int>();
  ...
}

Under the hood a private field is still created, this is called a backing field in C# lingo. By only supplying a get and no set you prevent an external user from setting the backing field to reference another list (or null). However, an external user may still modify the contents of the original list. For example var m = new MyClass(); m.MyInts.Add(1); is still possible to do.

To prevent this you might attempt to give an external user access to the list by supplying it via an IEnumerable or IReadOnlyCollection interface. But doing that prevents you from modifying the list yourself, so you will have to re-add the _myInts field.

public class MyClass
{
  private readonly List<int> _myInts = new List<int>();
  public IEnumerable<int> MyInts => _myInts;
  // Alternatively
  public IReadOnlyCollection<int> MyInts => _myInts;
  ...
}

Ok, so now we must be safe, right? Well, what if the user of your class does something like this.

var m = new MyClass();
var i = m.MyInts as List<int>;
i.Add(1);

Now he is still able to modify your private readonly List. But don’t give up! There is still one last thing to try. Here comes the AsReadOnly method to the rescue!

public class MyClass
{
  private readonly List<int> _myInts = new List<int>();
  public IEnumerable<int> MyInts => _myInts.AsReadOnly();
  // Alternatively
  public IReadOnlyCollection<int> MyInts => _myInts.AsReadOnly();
}

The AsReadOnly method adds a read only wrapper around the list. Any attempts to cast it to a writable collection will fail. However, casting code will still compile, but fail in runtime.

var m = new MyClass();
var i = m.MyInts as List<int>; // Would set i to null
i.Add(1); // Would throw a null reference exception
...
var j = (List<int>) m.MyInts; // Would throw an InvalidCastException

Conclusion

To prevent modifications to a collection, expose it only through a ReadOnlyCollection wrapper. The easiest way to do this is to use the AsReadOnly method. However it is also possible to wrap the collection by creating a new ReadOnlyCollection object, new ReadOnlyCollection<int>(_myInts); for example.

Also note that this is a O(1) operation. The elements in the collection are not copied so there is only a really minor overhead cost to wrapping it.

C# Pitfalls of returning IEnumerable

Generic and specific return types

It is often preferred to work with as generic types as possible. Doing that opens up for more flexibility and being able to modify parts of the code without effecting any other parts. One example of this is IEnumerable<T> which is implemented by several generic collection types, such as HashSet<T>, List<T>, and LinkedList<T>.

If you write a method that works with a List<T> internally and returns it as an IEnumerable<T> the caller does not need know that it is actually a List<T> you are using. And if you later decide to change your implementation to use a different type internally the caller does not need to be updated as long as the new type also implements IEnumerable<T>.

This might have you think that you should always return IEnumerable<T> when possible. But this might come back and bite you.

The multiple enumeration issue

Whenever you iterate over a collection via the IEnumerable interface it gets Enumerated. This means that an Enumerator object is created that flattens the collection so that members can be accessed sequentially. If this is done more than once you are causing extra work to be performed.

IEnumerable<string> names = GetNames();
foreach (var name in names)
{
  Console.WriteLine(name);
}
var sb = new StringBuilder();
foreach (var name in names)
{
  sb.Append(name).Append(" ");
}

The above code Enumerates the name enumerable twice. In the best case this just introduces some extra work. But it can get really nasty if GetNames returns a database Query. This will make the code evaluate the Query twice, possibly even returning different results each time.

Avoiding multiple enumerations

Fortunatelly it is quite easy to avoid multiple enumerations. The calling code can force the enumeration to happen once during variable initialization.

IList<string> names = GetNames().ToList();
...

With this change the enumeration will only happen once and the rest of the code can stay the same. The ToList method will allocate a new List and populate it with the strings returned from the Enumerator.

The price of being generic

Now, assume the GetNames method actually do use a List<string> internally and returns it as an IEnumerable<string> but then the caller calls the ToList method, isn’t that just adding extra complexity? Yes! That is exactly right. And not only that, the call to ToList will create a new List<string> which will be populated with the exact same elements that already exists in the original List. So there is an efficiency penalty involved as well.

So what to do? Well, that depends. If you are writing a library that is aimed for users outside of your team or organization, in other words where you don’t know how the returned type will be used and it is important that your code is flexible while the interface is stable, then you probably should return an IEnumerable<T>.

But if you are in control of all the source code involved, then you can afford to be more specific in what types you return.

Building strings in C# with good performance

A little bit on strings

Strings in C# are immutable. What that means is that you cannot modify an existing string, you must create a new one.

Also strings are objects – sequential read only collections of char objects – so they are allocated on the managed heap, and therefore managed by the Garbage Collector (GC). Hence, if you wish to modify an existing string you can use it as input and create a new string. For example, let’s assume you have a string ”Hello” that you wish to append your friends name to.

var str = "Hello";
...
str = string.Concat(str, " Bob");

Now what happens under the hood here is that String.Concat creates a new string, ”Hello Bob”, and the str variable is updated to reference the newly created string. Since the old string, ”Hello”, no longer is referenced it will be garbage collected.

Building strings in a loop

Now, assume that you wish to build up a long string by concatenating new words in a loop. We can simulate this behavior by creating a loop like the one below.

public static string BuildString()
{
  var str = string.Empty;
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    str = string.Concat(str, append);
  }
  return str;
}

The code looks decent. But, since strings are immutable, a new string object is created for each iteration in the loop. That is 100 000 objects that needs to be created and garbage collected.

Fortunately there is a better way to to this. If we could use a mutable object in the loop instead we could just keep adding to it avoiding all this object creation. The .NET Framework provides a type just for this purpose, System.Text.StringBuilder. With some small modifications to the code we can use this instead.

private static string BuildString()
{
  var sb = new System.Text.StringBuilder(100_000);
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    sb.Append(append);
  }
  return sb.ToString();
}        

The numbers

I did some measurements using the two different approaches above. In figure 1 below you can see the memory allocations.


Figure 1. Memory allocations

The ”Exclusive Allocations” column shows how many allocations of managed memory that was done in the methods. As expected in the method using String.Concat there were 100 000 allocations. Interestingly an additional 199 999 allocations are added by String.Concat.

In the method using StringBuilder there is only a single allocation, and only two additional are added by calls to other methods (one allocation in the StringBuilder constructor and a second one in the ToString method.

I also measured the time it took to execute the two different implementations. The String.Concat version took 1.3 seconds to run while the StringBuilder version took 1.0 milliseconds. That is 1 300 times faster.

Conclusion

Memory allocation and Garbage Collection are expensive operations. As can be seen in the case above it can be difficult to determine if a piece of code will perform good or bad without knowledge of the underlying implementation (in this case how strings are implemented in the .NET Framework). Many insights when it comes to performance comes with experience.

Also, remember that in almost all cases code readability is much more important than blazing fast performance. Do not try prioritize optimization over readability unless you know you need to. However, there are cases when the code can be both clear and perform well. Try to aim for that.

C# Collections – Non-generic or Generic?

Non-generic- vs generic collections

When coding in C# you will pretty soon encounter Collections (Lists, Queues, Stacks, and so on). As soon as you wish to use a Collection of some sort you will have the choice of using one from the System.Collections namespace or one from the System.Collections.Generic namespace. The most commonly used Collection is the List and the non-generic version of List is the ArrayList.

Using the non-generic version forces you to cast the objects into the correct type.

ArrayList l = new ArrayList { new MyType() };
...
// MyType t = l[0]; This won't compile
MyType t = (MyType)l[0]; // But this will
MyOtherType t2 = (MyOtherType)l[0]; // Unfortunately this will also compile

As can be seen in the code above the non-generic ArrayList is error prone. There is no compile time check whether you mistakenly cast to the correct type or not. The generic version of ArrayList is List<T>. Using this forces you to define the type of objects the list will contain already when creating the list, and the static code analysis tools and compiler will give you errors if you misuse the type.

var l = new List<MyType> { new MyType() };
...
MyType t = l[0]; // Now this compiles fine
MyType t2 = (MyType)l[0]; // This also works fine, but the cast is not needed
// MyOtherType t3 = (MyOtherType)l[0]; This won't compile

So, no unboxing and boxing, no need to help the compiler understand what types of objects the list holds, no runtime errors due to incorrect casts.

Now, List is only one of many Collection types in the System.Collections.Generic namespace, you will find generic versions of all the non-generic Collections. My advice is to simply pretend that the non-generic Collection types doesn’t even exist, and use the generic ones, always.