.NET under the hood – The ’null’ keyword

I assume that you are familiar with the null keyword in C#. It represents a null reference, one that does not reference any object, and even though it has no type it can be assigned to any reference of any type.

Null as a concept is quite abstract. It is often used to represent the abscence of a value, so in itself it has no concrete representation. Well, this may work in theory, but once you boil it down to machine code you have to represent null somehow. In this post I will discuss some Common Intermediate Language (CIL) code related to null and also look at some implementation details of the dotnet runtime. But let’s start with some C# code:

string s = "A string";
if (s == null)
    Console.WriteLine("s is null");

In the code above we have an object, s, of type string which we compare to see if it is equal to null. Let’s compile this and see what the compiler does with this code. Don’t worry if you’re not used to reading CIL, I will explain what the opcodes does. Using the tool ildasm you can inspect the CIL-code generated by the compiler. For a main method containing the above code the compiler will generate the following CIL-code:

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       27 (0x1b)
  .maxstack  2
  .locals init (string V_0,
           bool V_1)
  IL_0000:  nop
  IL_0001:  ldstr      "A string"
  IL_0006:  stloc.0
  IL_0007:  ldloc.0
  IL_0008:  ldnull
  IL_0009:  ceq
  IL_000b:  stloc.1
  IL_000c:  ldloc.1
  IL_000d:  brfalse.s  IL_001a
  IL_000f:  ldstr      "s is null"
  IL_0014:  call       void [System.Console]System.Console::WriteLine(string)
  IL_0019:  nop
  IL_001a:  ret
} // end of method Program::Main

Instructions on line IL_0001 to IL_0007 will push the string ”A string” onto the stack. Then on line IL_0008 comes an interesting op-code, ldnull. This op-code means ”push null onto the stack”. The next op code, ceq, means ”check if [the two items on top of the stack are] equal”. But wait a minute…if null doesn’t have a value, how can it be put onto the stack? And how can you compare it with another value? Something fishy is going on here.

To understand what happens when ldnull is executed we need to take a look at the implementation of the runtime engine. Fortunately the source code of the dotnet runtime is available on GitHub, https://github.com/dotnet/runtime/ for anyone to dive into.

After browsing the code for a while I found an interesting piece of code in a file called interpreter.cpp, a function with the signature void Interpreter::LdNull() which seems to be executed when CEE_LDNULL is encountered. Looking in the file opcode.def it becomes obvious that CEE_LDNULL means opcode ldnull. The interesting part of the LdNull functions are the following two lines of C++ code:

OpStackTypeSet(m_curStackHt, InterpreterType(CORINFO_TYPE_CLASS));
OpStackSet<void*>(m_curStackHt, NULL);

This code sets the type of the stack variable to class and pushes a void pointer with value NULL, which is defined as 0 (zero), onto the operand stack.

If you are familiar with C and C++ you will recognize this. It is a regular null pointer (https://www.learncpp.com/cpp-tutorial/6-7a-null-pointers/). And for those of you who aren’t familiar with null pointers, a pointer holds an address to a place in your computer’s memory where a value (or several values) are stored and a null pointer is a pointer that has the value 0, which is not a valid memory address for your program to store any data.

So, in summary, null in .NET is, in practice, a pointer to memory address 0 (zero).