C# alternatives to returning null

What is problematic with returning null?

A common pattern, both in the code I am used to work with, and in parts of the .NET Framework, is to return null in methods if for some reason a valid return value is not available.

One example of this is the Find method of List<T>:

var persons = new List<Person>();
var bandit = persons.Find(p => p.Name == "Billy the Kid"); // Returns default(Person) which is Null (assuming Person is a reference type)
if (bandit == null)
{
  ...
}
else
{
  ...
}

So, why would you consider handling this, and similar cases, differently?

My first argument is, returning null makes code hard to use. Let me show you by example.

Assume that you write code that will call the following public method:

public Person GetPersonByName(string name)
{
  ...
}

Is there any way for the user to tell, by looking at the method signature, whether he needs to guard for the return value being null or not? No there is not. He will have to check the documentation, or the code (if available). Would it not be better if he could tell directly? You could achieve that by naming the method GetPersonByNameOrNullIfNotFound but that is not very desirable.

My second argument is, returning null forces the caller to pollute his code with multiple checks and if/else forks:

var dude = persons.GetPersonByName("Jesse");
if (dude == null)
{
  log.Error("Could not find Jesse");
}
else
{
  var car = cars.FindByOwner(dude);
  if (car == null)
  {
    log.Error("Dude, Where's My Car?");
  }
  else
  {
    ...
  }
}

This makes the code much harder to read.

So what alternatives are there?

Alternative 1: The null object pattern

The Null Object Pattern (wikipedia link) says that instead of returning null you should return a valid object, but with empty methods and fields. That is, an instance of the class that just doesn’t do anything. For example:

public Person GetPersonByName(string name)
{
  var id = _db.Find(name);
  if (id == 0)
  {
    return Person.Nobody;
  }
  return Person(id);
}

Here the Person class implements a static property, Nobody, that returns the null object version of Person.

Advantages

There are a couple of advantages of using this pattern over returning null.

  • Users do not need to add null checks in the calling code, making it simpler.
  • The risk of NullReferenceException being thrown is eliminated.

Disadvantages

All alternative ways have some disadvantages, using the null object pattern may:

  • Hide errors/bugs, since the program might appear to be running as expected
  • Force the introduction of just a different type of error checking

The last point here is interesting. If you, when implementing this pattern, realize that need to check the returned value anyway. Then this pattern is not suitable for your situation and you should consider a different solution.

Alternative 2: Fail fast

If you analyze your code and come to the conclusion that the case where null is returned is an exceptional case and really indicates an error condition, you can choose to throw an exception instead of returning null. One example of this is the File class in the .NET framework. Calling File.Open with an invalid path throws an exception (different exceptions depending on the type of error, for example FileNotFoundException if the file does not exist). A system that fails directly when an error condition is detected is called a Fail-fast system (wikipedia link).

I have worked in a large project where this philosophy was applied. Actually we didn’t throw exceptions, we directly halted the entire system, dumped all memory, stack and logs and reported the error. The result was that once the system went live it was really robust (having multiple levels of testing, some that ran for days or weeks simulating real load, also helped a lot).

Advantages

  • Makes errors visible
  • Forces you to fix any errors early, leading to a more robust system once in production
  • Reduces cost of fixing failures and bugs since it is cheaper to fix them early in the development process

Disadvantages

Failing fast might not be suitable in all situations. Assume for example that you are dependent on data from an external system, or user. If that system provides invalid data you do not want your system to fail. However, in situations where you are in control I recommend failing fast.

Alternative 3: Tester-Doer pattern

If you are depending on external systems and need to consider cases like your system being provided with corrupt data, shaky networks, missing files, database servers being overloaded, etc, throwing exceptions and halting the system won’t work for you. You could still throw exceptions and let the user add a try-catch clause to handle the exceptions, but if some scenarios are really error prone, throwing exceptions a lot, it might impact performance to the extend it is unacceptable (microsoft link). One way to approach this situation is to split the operation in two parts, one that checks if the resource is available and a second that gets the data. For example if you want to read a file but don’t know in advance that it is available you can do this:

if (File.Exists(path)) // Test if file exist
{
  var content = File.ReadAllText(path); // And if it does, read it
}

This idea can be expanded to test a lot of different preconditions and if they are fulfilled, do the operations.

Advantages

  • Allows you to verify that the operation will probably succeed
  • Removes the overhead of exception handling (exceptions are really bad for performance)
  • The calling code can be made quite clear

Disadvantages

  • Even though the test passes, the accessing method might fail. For example, in a multi threaded system the resource may have been deleted by another thread between the test and the accessing method.
  • Requires the caller to remember to do both calls and not just call the accessing method.

Alternative 4: Try-Parse pattern

A different version of the Tester-Doer Pattern is the Try-Parse pattern. One example where this is used in the .NET framework is the int.TryParse  method that tries to parse a string to an integer. It returns a boolean value that indicates whether the parsing succeeded or failed. The actual integer value is supplied by an out-parameter in the method call:

if (int.TryParse(aString, out var i))
{
  Console.WriteLine($"The value is {i}");
}

Advantages

  • Same as tester-doer, with the addition that you only need one call, hence the thread safety issue is taken care of.

Disadvantages

  • Obscure method signature where the return value is not the data you requested, instead an out variable is needed.

Summary

This post have hopefully provided you with some alternatives to returning null and some ideas on why and when it can be good to do so. As always, most important is that you try to make the code as clear and simple as possible. Now, code!

Blazor – first impressions

What is Blazor?

To quote the official Blazor homepage

”Blazor is a single-page web app framework built on .NET that runs in the browser with WebAssembly.”

The natural follow-up question is then ”What is WebAssembly?”. From the official WebAssembly homepage we can read that

”WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.”

That is a very technical description, most likely formulated by a software engineer. I will make an attempt to describe Blazor and WebAssembly in my own words.

Blazor is a framework for building applications for the web, similar to AngularJS and React. However, Blazor makes it possible to write your applications in a .NET language, such as C#, instead of JavaScript. Since the JavaScript engines in the browsers are limited to executing JavaScript another solution is required for running compiled programs in binary format. This new format is called WebAssembly, Wasm, and is supported by Chrome, Edge, Firefox, and Safari.

Part of the Blazor project is to create a .NET Runtime in the Wasm format that runs in the browser and executes .NET bytecode.

Why Blazor over AngularJS or React?

Currently Blazor is still experimental so you should not use it in live products, yet. However, when it reaches a more stable state it should make a pretty awesome alternative to the JavaScript based frameworks. If you already write your Middleware and Backend code using .NET or .NET Core than it should be appealing to be able to use C# for the front-end as well, being able to go full stack using C#.

Another big thing is that compiled binary code in WebAssembly executes up to 5-10 times faster than JavaScript. In a time where JavaScript performance is becoming a bottleneck for web applications this is major deal.

Then there are a bunch of other things that makes it appealing to run .NET in the browser, like great development tools, well known APIs and stable build tools.

How is it to develop with Blazor right now?

I set up Blazor and went through a couple of tutorials and I must say that it feels really stable and performant already, even though it is still at an experimental stage. Being a .NET developer spending most of my time writing code in C# it felt really nice to be able to use that instead of JavaScript. Even though JavaScript is perfectly fine and I have nothing against developing in JavaScript if the job requires it, I feel a lot more comfortable with C# so it’s nice to be able to write both the backend code and the frontend code using C#.

When you create your web application with Blazor you create components which are HTML and C# code that you can either choose to display separately or as part of other components. The concept is easy to grasp and if you are comfortable working with HTML and C# you should be able to understand what’s going on in the code right away.

If you are a C# developer interested in web development I highly recommend that you give Blazor a try. My guess is that it will become one of the major web frameworks for creating Single Page Applications (SPA).

How can I get started?

Visit the official Blazor web site at https://blazor.net where you will find instructions on how to get started as well as a tutorial that will guide you through the basic concepts.

You may also want to visit the official WebAssembly homepage at https://webassembly.org to learn more about Wasm.

How to design a Priority Queue in C#

What is a Priority Queue?

To be a bit formal, a priority queue is an abstract data type (ADT), similar to a regular queue, but where each element has a priority associated to it. An element with high priority is served before an element with a low priority. Elements which have the same priority are usually served in the order they were added, but that is not a formal requirement.

In other words, it’s like the queue to the fancy club down town, where all we regular people have to stay in line outside and the VIPs just walks right in.

Choosing the internal collection type

Since we are designing a queue, we want a method for Enqueuing elements and a method for Dequeuing elements. Somehow we need to keep track of the elements that are in the queue and the straight forward way to do this is to use a Collection type. One commonly used Collection type that supports both adding and removing elements is the generic List<T> from the System.Collections.Generic namespace. Sounds like a good fit? Yes! Let’s write some code:

public class PriorityQueue<T>
{
  private readonly List<T> _pq = new List<T>();
}

Making sure enqueued elements can be prioritized

In order to know which element that has the highest priority we need to be able to prioritize one against the other in some way. But how can we put such restrictions on the elements from within the PriorityQueue class? Fortunately there is a way in C# to put constraints on generic types (in our case our generic type is T). We can do this by informing the compiler that objects of type T must be comparable to each other so they can be sorted, in other words, T must implement IComparable (part of the .NET framework). Let’s add that restriction to our PriorityQueue:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
}

Enqueuing and Dequeuing elements

A queue is useless if you can’t enqueue and dequeue items from it. It will get a bit tricky here in a while when we will need to ensure elements are added in the right position in regards to priority, but let’s start with something simple. Let’s pretend that we don’t care about priority and just want a regular queue. This can be achieved by adding elements to the end of the List and removing them at the beginning:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
  
  public void Enqueue(T item)
  {
    _pq.Add(item);
  }

  public T Dequeue()
  {
    var item = _pq[0];
    _pq.RemoveAt(0);
  
    return item;
  }
}

Cool, now we have a regular queue, but how do we ensure we always Dequeue the top prioritized item? One way we could do that is to sort the items after each time we add a new item:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
  
  public void Enqueue(T item)
  {
    _pq.Add(item);
    _pq.Sort();
  }

  public T Dequeue()
  {
    var item = _pq[0];
    _pq.RemoveAt(0);
    
    return item;
  }
}

This should work, and it’s a descent solution. However, sorting the List<T> like this every time an element is enqueued is not the optimal solution. We can do it better.

Making it scale

Now we have to dive into the realms of Computer Science. If concepts like Big O notation and Binary Heaps just are strange words that you don’t know what they mean I recommend reading up on those first and then returning here. You can find an introduction to Big O notation here and a good explanation of Binary Min and Max Heaps here.

All ready to go? Great! So, using the solution above we get an O(nlogn) dependency when Enqueuing elements, that is due to the sort that occurs after each addition. However, if we order the data in the List<T> as a Binary Min Heap both the Enqueue and Dequeue operations can be improved to O(logn) which scales much better.

I will not explain how the Insert and Delete operations work in detail in a Binary Min Heap. You can find a good explanation, with fancy animations, by following the link above. Instead, lets look at the resulting code:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();

  public void Enqueue(T item)
  {
    _pq.Add(item);
    BubbleUp();
  }
  
  public T Dequeue()
  {
    var item = _pq[0];
    MoveLastItemToTheTop();
    SinkDown();
    return item;
  }

  private void BubbleUp() // Implementation of the Min Heap Bubble Up operation
  {
    var childIndex = _pq.Count - 1;
    while (childIndex > 0)
    {
      var parentIndex = (childIndex - 1) / 2;
      if (_pq[childIndex].CompareTo(_pq[parentIndex]) >= 0)
        break;
      Swap(childIndex, parentIndex);
      childIndex = parentIndex;
    }
  }

  private void MoveLastItemToTheTop()
  {
    var lastIndex = _pq.Count - 1;
    _pq[0] = _pq[lastIndex];
    _pq.RemoveAt(lastIndex);
  }

  private void SinkDown() // Implementation of the Min Heap Sink Down operation
  {
    var lastIndex = _pq.Count - 1;
    var parentIndex = 0;
    
    while (true)
    {
      var firstChildIndex = parentIndex * 2 + 1;
      if (firstChildIndex > lastIndex)
      {
        break;
      }
      var secondChildIndex = firstChildIndex + 1;
      if (secondChildIndex <= lastIndex && _pq[secondChildIndex].CompareTo(_pq[firstChildIndex]) < 0)
      {
        firstChildIndex = secondChildIndex;
      }
      if (_pq[parentIndex].CompareTo(_items[firstChildIndex]) < 0)
      {
        break;
      }
      Swap(parentIndex, firstChildIndex);
      parentIndex = firstChildIndex;
    }
  }

  private void Swap(int index1, int index2)
  {
    var tmp = _pq[index1];
    _pq[index1] = _pq[index2];
    _pq[index2] = tmp;
  }
}

There you have it! A fully working Priority Queue implementation in C# that scales.

You can find a, very similar but not quite identical, implementation on my GitHub page: https://github.com/Backhage/PriorityQueue

C# Threading Gotchas

Introduction

Threading and concurrency is a big topic and there are plenty of resources out there that covers the hows and whats related to starting new threads and avoiding locking up your UI and so on. I will not go into those details but rather try to focus on things that are good to know, but aren’t covered in the normal threading howtos you find online.

When is it worth starting a thread?

The best thread is the one you don’t need. However, a rule of thumb is that operations that might take longer than 50 ms to complete are candidates to run on a separate thread. The reason for that is that there is overhead involved in creating and switching between threads. Also, remember that for I/O bound operations there are often asynchronous methods you can use instead of spawning a thread on your own.

What is the difference between a background and a foreground thread?

The main thread and any thread you create a using System.Threading.Thread is by default a foreground thread. Any task you put on the System.Threading.ThreadPool is run on a background thread.

var tf = new Thread(MyMethod);
tf.Start(); // Starts a new thread that runs in the foreground
...
ThreadPool.QueueUserWorkItem(MyMethod); // Runs on a background thread

There is only one thing that differs between a background and a foreground thread. That is that foreground threads will block the application from exiting until they have completed, but background threads will be abruptly aborted when the application exits. Note: This means that any clean-up actions you have defined, such as removing temporary files, will not be run if they are supposed to happen on a background thread that gets interrupted by the application being shut down. You can however use the Thread.Join method to avoid this problem.

It is also possible to set a new thread to run as background thread if you wish to avoid that it may block application exit. This is a good idea for long running threads that otherwise can lock up the application. Most of us have probably experienced applications becoming unresponsive and the only way to shut them down is via the task manager. This is often caused by hanged foreground threads.

var t = new Thread(MyMethod) { IsBackground = true };
t.Start(); // Runs as a background thread

Also note that all Task-based operations, such as Task.Run, but also await-ed methods, are run on the thread pool, and hence on background threads.

How to catch exceptions on threads?

Take a look at this code sample:

public static void Main()
{
  try
  {
    var t = new Thread(MyMethod);
    t.Start();
  }
  catch (exception ex)
  {
    ...
  }
}

private static void MyMethod { throw null; } // Throws a NullReferenceException

Will the NullReferenceException thrown from MyDelegate be caught? Unfortunately no. Instead the program will terminate due to an unhandled exception.

The reason why the exception cannot be caught this way is simply because each thread has it’s own independent execution path, they progress independently of each other (until they hit a lock or some signaling, like ManualResetEvent). To be able to handle the exception you will have to move the try-catch block into MyMethod:

public static void Main()
{
  var t = new Thread(MyMethod).Start();
}

private static void MyMethod()
{
  try
  {
    throw null;
  }
  catch (exception ex) // Here the exception will be caught
  {
    // Exception handling code. Typically including error logging.
    ... 
  }
}

Note that Tasks, unlike Threads, propagate exceptions. So in the case of using Task.Run you can do this:

public static void Main()
{
  var t = Task.Run(() => { throw null; });
  try
  {
    t.Wait();
  }
  catch (AggregateException ex)
  {
    // Exception handling code.
    // The NullReferenceException is found at ex.InnerException
    ...
  }
}

Tricky captured variables

Consider the following code:

for (var i = 0; i < 10; i++)
{
  var t = new Thread(() => Console.Write(i));
  t.Start();
}

When I ran this I got the following output:

2
3
5
4
5
1
6
9
7
10

Notice how the value 5 is written twice and 0 and 8 is missing, and we actually got the number 10 written. Why does this happen? The answer is that the variable i refers to the same memory during the entire lifetime of the loop. Inside the loop we start 10 different threads that all reads the same memory address when it is about to write the value of i. However, i is updated on the main thread which runs independently of the other 10.

How do you think the code will behave if we make this small change:

for (var i = 0; i < 10; i++)
{
  var temp = i;
  var t = new Thread(() => Console.Write(temp));
  t.Start();
}

This time the numbers 0 to 9 will written without duplicates or missing numbers, the order is still not deterministic though. This is because the line var temp = i creates a new variable for each iteration and copies the current value of i to that location. Each thread will therefore refer to a separate memory location. The threads are however not guaranteed to run in the order they are started.

Ending words

There are lots of things to keep in mind when working with threads. I have touched on some things in this post that I think can be tricky. As usual I recommend having a good book near by that you can use to look up things when they don’t work as you expect.

C# Pitfalls of returning IEnumerable

Generic and specific return types

It is often preferred to work with as generic types as possible. Doing that opens up for more flexibility and being able to modify parts of the code without effecting any other parts. One example of this is IEnumerable<T> which is implemented by several generic collection types, such as HashSet<T>, List<T>, and LinkedList<T>.

If you write a method that works with a List<T> internally and returns it as an IEnumerable<T> the caller does not need know that it is actually a List<T> you are using. And if you later decide to change your implementation to use a different type internally the caller does not need to be updated as long as the new type also implements IEnumerable<T>.

This might have you think that you should always return IEnumerable<T> when possible. But this might come back and bite you.

The multiple enumeration issue

Whenever you iterate over a collection via the IEnumerable interface it gets Enumerated. This means that an Enumerator object is created that flattens the collection so that members can be accessed sequentially. If this is done more than once you are causing extra work to be performed.

IEnumerable<string> names = GetNames();
foreach (var name in names)
{
  Console.WriteLine(name);
}
var sb = new StringBuilder();
foreach (var name in names)
{
  sb.Append(name).Append(" ");
}

The above code Enumerates the name enumerable twice. In the best case this just introduces some extra work. But it can get really nasty if GetNames returns a database Query. This will make the code evaluate the Query twice, possibly even returning different results each time.

Avoiding multiple enumerations

Fortunatelly it is quite easy to avoid multiple enumerations. The calling code can force the enumeration to happen once during variable initialization.

IList<string> names = GetNames().ToList();
...

With this change the enumeration will only happen once and the rest of the code can stay the same. The ToList method will allocate a new List and populate it with the strings returned from the Enumerator.

The price of being generic

Now, assume the GetNames method actually do use a List<string> internally and returns it as an IEnumerable<string> but then the caller calls the ToList method, isn’t that just adding extra complexity? Yes! That is exactly right. And not only that, the call to ToList will create a new List<string> which will be populated with the exact same elements that already exists in the original List. So there is an efficiency penalty involved as well.

So what to do? Well, that depends. If you are writing a library that is aimed for users outside of your team or organization, in other words where you don’t know how the returned type will be used and it is important that your code is flexible while the interface is stable, then you probably should return an IEnumerable<T>.

But if you are in control of all the source code involved, then you can afford to be more specific in what types you return.

Building strings in C# with good performance

A little bit on strings

Strings in C# are immutable. What that means is that you cannot modify an existing string, you must create a new one.

Also strings are objects – sequential read only collections of char objects – so they are allocated on the managed heap, and therefore managed by the Garbage Collector (GC). Hence, if you wish to modify an existing string you can use it as input and create a new string. For example, let’s assume you have a string ”Hello” that you wish to append your friends name to.

var str = "Hello";
...
str = string.Concat(str, " Bob");

Now what happens under the hood here is that String.Concat creates a new string, ”Hello Bob”, and the str variable is updated to reference the newly created string. Since the old string, ”Hello”, no longer is referenced it will be garbage collected.

Building strings in a loop

Now, assume that you wish to build up a long string by concatenating new words in a loop. We can simulate this behavior by creating a loop like the one below.

public static string BuildString()
{
  var str = string.Empty;
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    str = string.Concat(str, append);
  }
  return str;
}

The code looks decent. But, since strings are immutable, a new string object is created for each iteration in the loop. That is 100 000 objects that needs to be created and garbage collected.

Fortunately there is a better way to to this. If we could use a mutable object in the loop instead we could just keep adding to it avoiding all this object creation. The .NET Framework provides a type just for this purpose, System.Text.StringBuilder. With some small modifications to the code we can use this instead.

private static string BuildString()
{
  var sb = new System.Text.StringBuilder(100_000);
  for (var i = 0; i < 100_000; i++)
  {
    var append = (char)('a' + i % ('z' - 'a' + 1));
    sb.Append(append);
  }
  return sb.ToString();
}        

The numbers

I did some measurements using the two different approaches above. In figure 1 below you can see the memory allocations.


Figure 1. Memory allocations

The ”Exclusive Allocations” column shows how many allocations of managed memory that was done in the methods. As expected in the method using String.Concat there were 100 000 allocations. Interestingly an additional 199 999 allocations are added by String.Concat.

In the method using StringBuilder there is only a single allocation, and only two additional are added by calls to other methods (one allocation in the StringBuilder constructor and a second one in the ToString method.

I also measured the time it took to execute the two different implementations. The String.Concat version took 1.3 seconds to run while the StringBuilder version took 1.0 milliseconds. That is 1 300 times faster.

Conclusion

Memory allocation and Garbage Collection are expensive operations. As can be seen in the case above it can be difficult to determine if a piece of code will perform good or bad without knowledge of the underlying implementation (in this case how strings are implemented in the .NET Framework). Many insights when it comes to performance comes with experience.

Also, remember that in almost all cases code readability is much more important than blazing fast performance. Do not try prioritize optimization over readability unless you know you need to. However, there are cases when the code can be both clear and perform well. Try to aim for that.