Archive september 2018

How to design a Priority Queue in C#

What is a Priority Queue?

To be a bit formal, a priority queue is an abstract data type (ADT), similar to a regular queue, but where each element has a priority associated to it. An element with high priority is served before an element with a low priority. Elements which have the same priority are usually served in the order they were added, but that is not a formal requirement.

In other words, it’s like the queue to the fancy club down town, where all we regular people have to stay in line outside and the VIPs just walks right in.

Choosing the internal collection type

Since we are designing a queue, we want a method for Enqueuing elements and a method for Dequeuing elements. Somehow we need to keep track of the elements that are in the queue and the straight forward way to do this is to use a Collection type. One commonly used Collection type that supports both adding and removing elements is the generic List<T> from the System.Collections.Generic namespace. Sounds like a good fit? Yes! Let’s write some code:

public class PriorityQueue<T>
{
  private readonly List<T> _pq = new List<T>();
}

Making sure enqueued elements can be prioritized

In order to know which element that has the highest priority we need to be able to prioritize one against the other in some way. But how can we put such restrictions on the elements from within the PriorityQueue class? Fortunately there is a way in C# to put constraints on generic types (in our case our generic type is T). We can do this by informing the compiler that objects of type T must be comparable to each other so they can be sorted, in other words, T must implement IComparable (part of the .NET framework). Let’s add that restriction to our PriorityQueue:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
}

Enqueuing and Dequeuing elements

A queue is useless if you can’t enqueue and dequeue items from it. It will get a bit tricky here in a while when we will need to ensure elements are added in the right position in regards to priority, but let’s start with something simple. Let’s pretend that we don’t care about priority and just want a regular queue. This can be achieved by adding elements to the end of the List and removing them at the beginning:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
  
  public void Enqueue(T item)
  {
    _pq.Add(item);
  }

  public T Dequeue()
  {
    var item = _pq[0];
    _pq.RemoveAt(0);
  
    return item;
  }
}

Cool, now we have a regular queue, but how do we ensure we always Dequeue the top prioritized item? One way we could do that is to sort the items after each time we add a new item:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();
  
  public void Enqueue(T item)
  {
    _pq.Add(item);
    _pq.Sort();
  }

  public T Dequeue()
  {
    var item = _pq[0];
    _pq.RemoveAt(0);
    
    return item;
  }
}

This should work, and it’s a descent solution. However, sorting the List<T> like this every time an element is enqueued is not the optimal solution. We can do it better.

Making it scale

Now we have to dive into the realms of Computer Science. If concepts like Big O notation and Binary Heaps just are strange words that you don’t know what they mean I recommend reading up on those first and then returning here. You can find an introduction to Big O notation here and a good explanation of Binary Min and Max Heaps here.

All ready to go? Great! So, using the solution above we get an O(nlogn) dependency when Enqueuing elements, that is due to the sort that occurs after each addition. However, if we order the data in the List<T> as a Binary Min Heap both the Enqueue and Dequeue operations can be improved to O(logn) which scales much better.

I will not explain how the Insert and Delete operations work in detail in a Binary Min Heap. You can find a good explanation, with fancy animations, by following the link above. Instead, lets look at the resulting code:

public class PriorityQueue<T> where T : IComparable<T>
{
  private readonly List<T> _pq = new List<T>();

  public void Enqueue(T item)
  {
    _pq.Add(item);
    BubbleUp();
  }
  
  public T Dequeue()
  {
    var item = _pq[0];
    MoveLastItemToTheTop();
    SinkDown();
    return item;
  }

  private void BubbleUp() // Implementation of the Min Heap Bubble Up operation
  {
    var childIndex = _pq.Count - 1;
    while (childIndex > 0)
    {
      var parentIndex = (childIndex - 1) / 2;
      if (_pq[childIndex].CompareTo(_pq[parentIndex]) >= 0)
        break;
      Swap(childIndex, parentIndex);
      childIndex = parentIndex;
    }
  }

  private void MoveLastItemToTheTop()
  {
    var lastIndex = _pq.Count - 1;
    _pq[0] = _pq[lastIndex];
    _pq.RemoveAt(lastIndex);
  }

  private void SinkDown() // Implementation of the Min Heap Sink Down operation
  {
    var lastIndex = _pq.Count - 1;
    var parentIndex = 0;
    
    while (true)
    {
      var firstChildIndex = parentIndex * 2 + 1;
      if (firstChildIndex > lastIndex)
      {
        break;
      }
      var secondChildIndex = firstChildIndex + 1;
      if (secondChildIndex <= lastIndex && _pq[secondChildIndex].CompareTo(_pq[firstChildIndex]) < 0)
      {
        firstChildIndex = secondChildIndex;
      }
      if (_pq[parentIndex].CompareTo(_items[firstChildIndex]) < 0)
      {
        break;
      }
      Swap(parentIndex, firstChildIndex);
      parentIndex = firstChildIndex;
    }
  }

  private void Swap(int index1, int index2)
  {
    var tmp = _pq[index1];
    _pq[index1] = _pq[index2];
    _pq[index2] = tmp;
  }
}

There you have it! A fully working Priority Queue implementation in C# that scales.

You can find a, very similar but not quite identical, implementation on my GitHub page: https://github.com/Backhage/PriorityQueue

C# Threading Gotchas

Introduction

Threading and concurrency is a big topic and there are plenty of resources out there that covers the hows and whats related to starting new threads and avoiding locking up your UI and so on. I will not go into those details but rather try to focus on things that are good to know, but aren’t covered in the normal threading howtos you find online.

When is it worth starting a thread?

The best thread is the one you don’t need. However, a rule of thumb is that operations that might take longer than 50 ms to complete are candidates to run on a separate thread. The reason for that is that there is overhead involved in creating and switching between threads. Also, remember that for I/O bound operations there are often asynchronous methods you can use instead of spawning a thread on your own.

What is the difference between a background and a foreground thread?

The main thread and any thread you create a using System.Threading.Thread is by default a foreground thread. Any task you put on the System.Threading.ThreadPool is run on a background thread.

var tf = new Thread(MyMethod);
tf.Start(); // Starts a new thread that runs in the foreground
...
ThreadPool.QueueUserWorkItem(MyMethod); // Runs on a background thread

There is only one thing that differs between a background and a foreground thread. That is that foreground threads will block the application from exiting until they have completed, but background threads will be abruptly aborted when the application exits. Note: This means that any clean-up actions you have defined, such as removing temporary files, will not be run if they are supposed to happen on a background thread that gets interrupted by the application being shut down. You can however use the Thread.Join method to avoid this problem.

It is also possible to set a new thread to run as background thread if you wish to avoid that it may block application exit. This is a good idea for long running threads that otherwise can lock up the application. Most of us have probably experienced applications becoming unresponsive and the only way to shut them down is via the task manager. This is often caused by hanged foreground threads.

var t = new Thread(MyMethod) { IsBackground = true };
t.Start(); // Runs as a background thread

Also note that all Task-based operations, such as Task.Run, but also await-ed methods, are run on the thread pool, and hence on background threads.

How to catch exceptions on threads?

Take a look at this code sample:

public static void Main()
{
  try
  {
    var t = new Thread(MyMethod);
    t.Start();
  }
  catch (exception ex)
  {
    ...
  }
}

private static void MyMethod { throw null; } // Throws a NullReferenceException

Will the NullReferenceException thrown from MyDelegate be caught? Unfortunately no. Instead the program will terminate due to an unhandled exception.

The reason why the exception cannot be caught this way is simply because each thread has it’s own independent execution path, they progress independently of each other (until they hit a lock or some signaling, like ManualResetEvent). To be able to handle the exception you will have to move the try-catch block into MyMethod:

public static void Main()
{
  var t = new Thread(MyMethod).Start();
}

private static void MyMethod()
{
  try
  {
    throw null;
  }
  catch (exception ex) // Here the exception will be caught
  {
    // Exception handling code. Typically including error logging.
    ... 
  }
}

Note that Tasks, unlike Threads, propagate exceptions. So in the case of using Task.Run you can do this:

public static void Main()
{
  var t = Task.Run(() => { throw null; });
  try
  {
    t.Wait();
  }
  catch (AggregateException ex)
  {
    // Exception handling code.
    // The NullReferenceException is found at ex.InnerException
    ...
  }
}

Tricky captured variables

Consider the following code:

for (var i = 0; i < 10; i++)
{
  var t = new Thread(() => Console.Write(i));
  t.Start();
}

When I ran this I got the following output:

2
3
5
4
5
1
6
9
7
10

Notice how the value 5 is written twice and 0 and 8 is missing, and we actually got the number 10 written. Why does this happen? The answer is that the variable i refers to the same memory during the entire lifetime of the loop. Inside the loop we start 10 different threads that all reads the same memory address when it is about to write the value of i. However, i is updated on the main thread which runs independently of the other 10.

How do you think the code will behave if we make this small change:

for (var i = 0; i < 10; i++)
{
  var temp = i;
  var t = new Thread(() => Console.Write(temp));
  t.Start();
}

This time the numbers 0 to 9 will written without duplicates or missing numbers, the order is still not deterministic though. This is because the line var temp = i creates a new variable for each iteration and copies the current value of i to that location. Each thread will therefore refer to a separate memory location. The threads are however not guaranteed to run in the order they are started.

Ending words

There are lots of things to keep in mind when working with threads. I have touched on some things in this post that I think can be tricky. As usual I recommend having a good book near by that you can use to look up things when they don’t work as you expect.

Book Review – The imposter’s handbook

Introduction

Just by reading the heading of this blog post won’t tell you anything about what kind of book I am writing a review on. The full title is actually The imposter’s Handbook – A CS Primer For Self-Taught Programmers, where CS is short for Computer Science.

The book is written by Rob Conery, a self-taught programmer without a CS degree. In 2014 Rob spent a year studying Computer Science fundamentals and wrote about the things he learned, which resulted in this book. When I heard the book being discussed on the Coding Blocks podcast I got interested and decided to order a copy of it for myself. Just like Rob I am (mostly) self-taught when it comes to Computer Science subjects. I do have a master’s degree, but in electrical engineering, so none of the courses I took on the University covered the subjects that Rob writes about.

CS subjects covered in the book

The book touches on many areas, and does not deep dive into any of them, so it is probably wrong to say that any of the subjects are ”covered”. However, the author introduces each subject and gives you enough understanding about them to cover the basics. And if you want to deep dive into anyone there are a lot of books out there that do cover the details.

Subjects discussed are:

  • Computation
  • Complexity
  • Lamdba Calculus
  • Machinery
  • Big O
  • Data Structures
  • Algorithms
  • Compilation
  • Software Design Patterns
  • Software Design Principles
  • Functional Programming
  • Databases
  • Testing
  • Essential Unix

As you understand, with these many subjects, you cannot dive into details and still have everything in a single book.

Is this book for me?

I would say that I depends. For me personally I enjoyed reading the first chapters, but from Big O and forward I pretty much already knew the things that the book brings up. However, I recognize that I am not the typical self-taught programmer. I read, a lot, of books on programming, I have taken Coursera courses on algorithms, and I do programming challenges on Codewars, Hackerrank, and Codility just for the fun of it, I listen to several programming podcasts, and subscribe to several programming newsletters. But if you look at the subjects listing above and feel that you don’t have a basic understanding on these subjects, this book is most certainly quite useful.

Rating

I would really like to give this book a high rating, since I know that the author has put a lot of effort into learning the things himself, as well as writing about them in a way that is useful for others. There are however some things that lowers the score. These are:

  • Not correctly formatted for print
    I have the printed version of the book, and it is obvious that the source needs to be looked over to avoid having pages where the last line on the page is a heading, and similar formatting errors.
  • Questionable code quality
    I found many of the code samples to be questionable in regards of code structure, naming of variables, etc. I expected the book to contain code samples that clearly showed the expected functionality.
  • Questionable text quality
    When I read a technical book I expect it have been proof read and reworked a couple of times. This book often feel more like an early draft than the finished book.

Taking the above into account I give this book a score of: 3/5. It can definitely be good to get a brief understand on some important CS subjects, but if you want to learn any of the subjects really well, I recommend complementing with some books from a well known publisher.

Links

The homepage for the Imposter’s handbook

 

 

C# struct and class – when to use each?

Value types and Reference types

In C# you have both Value types and Reference types. There are some fundamental differences between them that are important to understand.

The most common value types are the simple ones, like int, bool, char, and long. But also enum and struct are value types. They are called value types due to the fact that variables that are based on any of those types directly contain the values.

The second group are Reference types. Variables based on these do not contain the values directly, instead they contain a reference to the where the objects can be found. Common Reference types are classes, delegates, interfaces, and strings.

Some key differences

  • With Value types, when you assign an existing variable to a new variable the actual value gets copied. With Reference types, a new reference to the existing object is created.
  • Value types lives on the stack. They seize to exist as soon as they go out of scope. Reference types lives on the heap. They seize to exist when they are collected by the Garbage Collector.
  • Value types are not polymorphic, what you see is what you have. Reference types can be polymorphic.

Differences between a class and a struct

It is not uncommon that you can switch between defining a class and a struct just by changing a single word in the code:

// A class definition
public class MyObject
{
  ...
}

// A struct definition
public struct MyObject
{
  ...
}

By doing so you change the type from a Reference type to a Value type, which can break a lot of existing code. For example:

public MyObject CreateGreen()
{
  var obj = new MyObject();
  SetGreenProperties(obj);
  return obj;
}

private static void SetGreenProperties(MyObject obj)
{
  obj.Color = "Green";
  ...
}

If MyObject is changed from a class to a struct the object returned from CreateGreen will no longer have the green properties set. Why? Because no longer is a reference to the object sent to SetGreenProperties. Instead a new copy, only local to the called method, is created. Note however that the code will still compile and run, but the application now contains a bug that might be hard to track down.

Structs also cannot be polymorphic, in other words you cannot define a generic base struct that other structs inherit.

public class Car
{
  ...
}

// This works just fine
public class Mustang : Car
{
 ...
}

public struct Animal
{
  ...
}

// But this does not compile
public struct Lion : Animal
{
 ...
}

However, both classes and structs can implement interfaces.

When to use a struct over a class?

It might seem that a struct is just a class with limitations. So why should you bother with structs? Why not just use class only?

The, maybe not so convincing answer, is that you should use classes to define the behavior of your application and structures for storing the data that your application manipulates. Structs are Value types, they are suited for storing values. I will elaborate on this idea a bit more to build my case.

Structs will come to their best use if you make them immutable. This can be done by setting all the values in the constructor and only supplying read only properties.

public struct Address
{
  public string Line1 { get; }
  public string Line2 { get; }
  public string City { get; }
  public string State { get; }
  public int Zip { get; }

  public Address(
    string line1, 
    string line2,
    string city,
    string state,
    int zip)
  {
    Line1 = line1;
    Line2 = line2;
    City = city;
    State = state;
    Zip = zip;
  }
}

Once all fields have been assigned you can be sure the struct as a whole is valid. There is no way to alter individual fields, leaving the struct in an invalid state. If you want to update the address you simply create a new Address with the updated information and replace the old one.

public void UpdateAddress(Address newAddress)
{
  this.address = newAddress;
}

If you know a bit on how value types are handled you might be concerned that there is a lot of unnecessary copying of values going on when you pass around structs instead of just references to objects. It is true that value types get copied when you pass them to other methods, and that is why you should keep your structs quite small. The copy operations are very efficient for value types  and creating a reference type instead would involve overhead for book keeping and garbage collection.

Here is a checklist for when a struct can be suitable to use:

  • It will be used for storing data
  • You can make it immutable
  • You can limit it’s public interface to get only properties
  • There is no need for it to have subclasses
  • It can be small

Pitfalls

Structs in multi threaded systems

It is important to understand that when you assign a struct variable to another, each field gets copied, and this may not be an atomic operation. In a multi threaded system where you have a class or a static variable holding a struct and it is accessed and updated by different threads, you will need to add locks around the read and assignment operations. Even though the struct itself is immutable.

public class Person
{
  // public Address Address { get; set; } // Not thread safe

  // Somewhat thread safe, depending on usage.
  private readonly object lockObj = new object();
  private Address _address;
  public Address Address
  {
    get
    {
      lock (lockObj)
      {
        return _address;
      }
    }
    set
    {
      lock (lockObj)
      {
        _address = value;
      }
    }
  }
  // Bad usage:
  //   Console.WriteLine(person.Address.Line1);
  //   Console.WriteLine(person.Address.Line2); // Address might have been updated between these accesses
  // Good usage:
  //   var address = person.Address; // Make a local copy of the address struct
  //   Console.WriteLine(address.Line1);
  //   Console.WriteLine(address.Line2);
  ...
}

An even better alternative might be to avoid adding locks in the Person class and instead add locks in the code using it.

var p = new Person(address);
...
var newAddress = new Address(...);
lock (p)
{
  p.Address = newAddress;
}

Struct members being mutable types

Part of being immutable is also securing that your struct does not have properties that return references to mutable types.

public struct Unsecure
{
  private readonly int[] _values;
  public IEnumerable<int> Values => _values;

  public Unsecure(int[] values)
  {
    _values = values;
  }
}

var values = new [] { 1, 2, 3 };
var unsecure = new Unsecure(values);
...
values[0] = 4; // Modifies the internals of the 'unsecure' object.

You can remedy this by using ImmutableArray<T> or ImmutableList<T> from the System.Collections.Immutable namespace that I have written about in previous posts.

public struct Secure
{
  private readonly ImmutableArray<int> _values;
  public IEnumerable<int> Values => _values;

  public Secure(int[] values)
  {
    _values = values.ToImmutableArray();
  }
}

var values = new [] { 1, 2, 3 };
var secure = new Secure(values);
...
values[0] = 4; // Now this does not modify the internals of 'secure'.

Conclusion

Now you hopefully have a pretty good understanding for when structs can be a good option, and how to avoid the pitfalls involved. Most of the time you will use classes, but there are some occasions where structs, are a better fit. You should learn to identify those situations and be able to reason about why you choose one over the other.

Finally, if in doubt, use a class.

C# Protecting private collections from modification – Part 2

The common object issue

In the last part I wrote about how you can protect a class’s private collections from modification by wrapping them in a ReadOnlyCollection before exposing them. But how about collections that are handed to your class? Consider the following code:

public class MyClass
{
  private readonly List<int> _myInts;

  public MyClass(List<int> ints)
  {
    _myInts = ints;
  }
  ...
}

What happens here is that MyClass takes a variable of reference type in it’s constructor, and creates a variable of reference type of its own, _myInts, that is set to reference the same List object as the constructor variable. In other words, the List of ints on the heap now has at least two references to it.

Now assume the code that instantiates MyClass looks something like this:

var ints = new List<int> { 1, 2, 3 };
var myClass = new MyClass(ints); // Use ints in the MyClass constructor
...
ints.Clear(); // Re-use ints for other purposes
ints.Add(7);
...

This code will clear and then add a 7 to the List that _myInts in MyClass references. Which may lead to unexpected behavior.

Protecting the data

Fortunately there is an easy way to protect the data from any modifications. By using a type from the Systems.Collections.Immutable namespace.

The Immutable namespace contains immutable classes and interfaces, as well as extension methods to create an immutable copy of a mutable collection. The MyClass code can be changed to take advantage of these types:

public class MyClass
{
  private readonly ImmutableList<int> _ints;

  public MyClass(List<int> ints)
  {
    _ints = ints.ToImmutableList();
  }
  ...
}

Note that it would add no protection from external modification by using AsReadOnly like in the previous post. The big difference here is that ToImmutableList creates a new ImmutableList object which is populated by enumerating the ints List and copying the values.

Conclusion

When dealing with references, it is easy to forget to properly protect data that should be private to a class. There are also no compile time warnings or errors for altering a referenced object that have many references to it. However, it is important to be able to trust that data that you don’t expect to change really stays the same. Use the support that is available in the framework in order to achieve this.