Book review: Designing Data-Intensive Applications

I just finished reading Martin Kleppmann’s book ”Designing Data-Intensive Applications”. Reading it cover to cover takes time, a lot of time, but it is definitely worth it.

It is not a book for anybody, it is a book for software developers and designers who work with systems that handle a lot of data and have high demands on availability and data integrity. The book is quite technical so it is preferable if you have a good understanding of the infrastructure of your system and the different needs it is supposed to meet.

Book cover

The book is divided into three different parts, each made up of several chapters. The first part is called Foundations of Data Systems and consists of four different chapters. Examples of topics being explained are Reliability, Scalablity, and Maintainability, SQL and NoSQL, datastructures used in databases, different ways of encoding data, and modes of dataflow. The details are quite complex and you will most probably not be able to just scim over the pages and expect to be able to follow along.

The second part is called Distributed Data and has five chapters. It discusses Replication, Partitioning, Transactions, Troubles with Distributed Systems, and Consistency and Consensus. After reading this part of the book I remembered thinking that it is amazing that we have these big complex systems that actually work. Kleppman describes so many things that can go wrong in a system that you start to wonder how in the world it is possible that things actually do work…most of the time.

The third, and last, part of the book is called Derived Data and consists of three chapters. Here Kleppman describes different types of data processing, Batch Processing and Stream Processing. I am not at all familiar with the concepts of distributed filesystems, MapReduce, and other details discussed in the Batch Processing chapter so I found it a bit hard to keep focus while reading about it. However, the Stream Processing chapter was very interesting.

To sum up. I really enjoyed reading this book. I think it was a great read and it really helped me get a better understanding of the system I am working with (maybe most importantly what it is NOT designed for). I recommend anyone working with larger data intensive systems to read it, it will take time, but it is well invested time.

Finally, I would like to thank Kristoffer who told me to read this book. Thank you!

Book review: C# in depth (4th ed)

C# in depth is written by Jon Skeet, a software engineer currently working at Google. He is known for being an extremely active user on Stackoverflow, having an odd fascination in the field of date and time (he is one of the Noda Time library authors), and being very interested in the C# language.

As the C# language has evolved and new versions have been released, so have new editions of the book been published. The latest edition, the fourth, covers C# 1 – 7, and a little bit on the upcoming C# 8. It is pretty much a history book of the C# language with deep dives into the most important changes and features.

I really want to stress that this is a book on the C# language and not the .NET Framework. Hence, it covers how the syntax, keywords, and language itself has evolved. It is also not a book for someone starting out with C# wishing to learn how to write Hello world, it is for the intermediate to advanced developer.

Book cover

One reflection I had when reading this book is that Jon sometimes writes as he is explaining concepts to someone that has very little experience with C#, and in the next paragraph he writes for someone with deep knowledge of the language. Many parts of the book was on things I already knew quite well and I could just skim through them, and some parts I had to read really slowly to be able to follow along.

I also think that he sometimes takes it a bit too far, even though the title of the book is C# in depth. I never thought it was possible to write so much on tuples… Anyway, most sections of the book is interesting, well written, and well explained.

Summary: This is not a must read. You can be a great C# developer without having read it. But if you are interested in the history and evolution of the C# language, and wish to gain deeper understanding of the different parts that make up the language, then this book is for you.

The end of the .NET Framework?

I remember last autumn when I was out jogging. I was listening to an episode of a well known podcast on programming and the future of .NET was discussed. The hosts and the guest on the show were discussing the role of .NET Core and I remember that they recommended using Core for green field projects, and even older projects with a lot of active development. However, they were all very convinced that the original .NET Framework would be actively developed and maintained for many many years to come, no, need to worry. It seems that they were wrong.

Today, the future of .NET Framework has taken a different turn. You probably know that a new version of the C# language is to be released later this year. The new version will be called C# 8 (current version is 7.3). C# 8 will introduce language support for new types that have been added to .NET Standard 2.1.

Here comes the interesting part. In order to support these new types fully, changes are required to the .NET runtime. It has however been decided that these changes will not be done to the .NET Framework. It will stay on .NET Standard 2.0. Only .NET Core 3.0 will support .NET Standard 2.1.

This also means that it will not be possible to compile C# 8 code if targeting .NET Framework. Not even with the newest .NET Framework 4.8 version.

It has also been announced that version 4.8 will be the last version of the original .NET Framework. After that, and after .NET Core 3, the plan is to release .NET 5 in year 2020. .NET 5 will however be based on .NET Core 3 and Mono.

My view on this is that it is probably the right path forward for .NET. The current situation, with .NET Framework, .NET Core, Mono, etc. is confusing for developers, and I am sure it takes a lot of energy and resources to maintain all different tracks.

To you .NET developers out there I would recommend starting to investigating what it would take to migrate your active projects to .NET Core.

You can find an official statement to back up claims in this post at https://devblogs.microsoft.com/dotnet/building-c-8-0/

When to NOT use interpolated strings

I am currently reading the newest edition of Jon Skeet’s book ”C# in depth” (https://csharpindepth.com/) which is a quite interesting read so far.

There is however one thing in the book that made me feel really bad about myself. You see, I really think interpolated strings helps improve readability quite a lot, and therefore I have been changing a lot of code similar to this:

Log.Debug("Value of property A is {0} and B is {1}", variable.A, variable.B);

with this:

Log.Debug($"Value of property A is {variable.A} and B is {variable.B}");

Can you think of a reason why this is a BAD thing to do? Assume that you have your code live in a production environment. In most cases debug level logging will then be turned off.

In the first variant of the code, where the properties are given as separate parameters, the Debug method will just return and nothing is done with the parameters. But in the second variant, where the parameter to the Debug method has been changed to an interpolated string, strings will be constructed for both properties and a resulting formatted string will be constructed before Debug is called. This means that the program will do all the work needed to construct the string, and then just throw it away.

Using an interpolated string in this scenario might slow down the application quite a bit, even with Debug level logging turned off!

To summarize, do not use interpolated strings unless you are 100% sure that the result will actually be used!

Book Review: Agile Principles, Patterns, and Practices in C#

I have been reading Robert C. Martin’s and Micah Martin’s book, Agile Principles, Patterns, and Practices in C#, for quite a while now. The reason it has taken me so long to finish is that it is packed with so much information and covers so many aspects of software development, enough for at least 3-4 different books of their own.

Book cover

The book begins with a section on Agile Development which covers topics such as Agile Practices, Extreme Programming, Planning, Testing, and Refactoring. It continues with a section on Agile Design where the famous SOLID principles are covered and also covers UML and how to effectively work with diagrams. In the third section a number of design patterns are introduced and the practices learnt so far are put into practice in a case study. Finally, the fourth and final section covers Principles of Package and Component Design (REP, CRP, CCP, ADP, SDP, and SAP) and introduces several other design patterns. It ends with a lot of code examples where database (SQL) support and a user interface is added to the application introduced in section three.

Even though the book is over 10 years old it is still highly relevant. Agile software development, good practices and principles, and patterns for OOP, are skills that all software developers today will benefit from educating themselves on. There are tons of online material, classes, and other books that covers these topics, but I don’t know of any other resource that have all of it in the same place.

With that said, I highly recommend this book. But to get the most of it you need to be prepared to put a lot of time and focus on reading it and really understanding the reasoning behind the principles and patterns. Personally I had to re-read some sections and take notes while I was reading, or I felt like I didn’t get all the details. It might be helpful to buy some copies to your workplace and run it as a book circle so that you get to discuss the contents with other developers.

Verdict: Highly recommended!

The Stable-Abstractions Principle

What?

This is the last of the Principles of Package and Component Design, the Stable-Abstractions Principle (SAP). It says: ”A component should be as abstract as it is stable.”

In the Stable-Dependencies Principle (SDP) post we learned that stability, or in that case Instability, I, can be calculated using the formula I = Ce / (Ca + Ce) where Ca is the number of afferent couplings and Ce is the number of efferent couplings. A component that has many dependencies towards it, but depend only on a few external classes is considered stable, and vice versa.

Conforming to SAP leads to stable components containing many abstract classes, and instable components containing many concrete classes.

Why?

The goal of making stable components abstract is to allow for them to be easily extended and avoid them constraining the design.

Instable components on the other hand can contain concrete classes since they are easily changed

How?

Just as for instability we can define a metric that helps us understand how abstract a component is. The formula is very simple:

A = Na / Nc

where Na is the number of abstract classes in the component and Nc is the total number of classes in the component. A component with A = 1 is completely abstract while a component with A = 0 is completely concrete.

The Stable-Dependencies Principle

What?

The Stable-Dependencies Principle (SDP) says: ”Depend in the direction of stability.”

What this means is that the direction of the dependencies in the component dependency graph should point towards more stable components. To understand what this means we need to define stability for a component. Uncle Bob turns this around and defines a way to measure instability, I, using the formula:

I = Ce / (Ca + Ce)

where Ca is the number of classes outside this component that depend on classes within this component (afferent couplings) and Ce is the number of classes inside this component that depends on classes outside this component (efferent couplings). I has the range [0,1] where I = 0 indicates a maximally stable component and I = 1 indicates a maximally instable component.

A component that does not depend on any other component is maximally stable and a component that only depend on other components but have no components depending on it is maximally instable.

Why?

When we design our components there will be some components that we want to be easy to change. They contain classes that often needs to be modified when requirements change. If these components have many dependencies to them (are very stable; have a value I close to 0) they will be difficult to change since changing them will impact many other components.

Figure 1 shows a component diagram where SDP is violated. The Stable component, that has an instability value of I = 0.25 has a dependency to the Volatile component, that has an instability value of I = 0.75. This dependency makes the volatile component difficult to change.

Figure 1. Violating SDP

How?

So how do we fix violations to SDP? The answer is, once again, inverting dependencies by applying the Dependency Inversion Principle (DIP). Look at Figure 1, to fix the violation we can create an interface or abstract base class and put in a new component, CG. We can then let the Stable component depend on CG and have the Volatile component implement the concrete classes. By doing that we end up with the dependency graph showed in Figure 2.

Figure 2. Conforming to SDP

Now with this change in place the volatile component has been made easier to change.

The Acyclic Dependencies Principle

What?

The Acyclic Dependencies Principle (ADP) is the first of three principles that deals with the relationships between components.

It says: ”Allow no cycles in the component dependency graph.”

If you draw the components and the dependencies between them and you are able to follow a dependency back to a component you have already visited, then it is a violation to ADP.

Figure 1. Violating ADP

Why?

So why should you care about cyclic dependencies? Because their existence makes it very hard to split up responsibilities and work on different tasks in parallel without stepping on each other toes all the time. Take a closer look at ComponentH in Figure 1 above. The dependency to ComponentA makes it depend on every other component in the system. Since ComponentD depends on ComponentH, it also makes ComponentD depend on every other component in the system.

This makes it really hard to work with and release components D and H. But what would happen if we could remove the dependency from ComponentH to ComponentA? Then it would be possible to work with and release ComponentH in isolation. The person, or team, working with ComponentD could lift in new releases of ComponentH when they needed, and wait with taking in the new release if they had other more prioritized tasks to solve before moving to the new release.

How?

How can we break a dependency such as the dependency from ComponentH to ComponentA? There are two options we can use:

  1. Apply the Dependency-Inversion Principle (the D in SOLID) and create an abstract base class or interface in ComponentH and inherit it in ComponentA. This will invert the dependency so that ComponentA depends on ComponentH instead.
  2. Create a new component – let’s call it ComponentI – that both ComponentA and ComponentH depends on. See figure 2 below.
Figure 2. Conforming to ADP

It is easy to introduce cyclic dependencies. You need to keep track of the dependencies in your system and break cycles when they appear.

The Common Closure Principle

What?

The Common Closure Principle (CCP) states: ”The classes in a component should be closed together against the same kind of changes. A change that affects a component affects all the classes in that component and no other components.”

To put it in other words, a component should not have multiple reasons to change. This is the Single Responsibility Principle (SRP) from SOLID restated for components.

Why?

The goal of CCP is to increase maintainability. If a change to requirements always triggers changes in a lot of the components it will be harder to make the change, each change will lead to a lot of re-validation and redeploying, and parallelizing different changes will be harder.

You would prefer the changes to occur in one component only.

How?

As with SRP, full closure is not possible. There will always be changes that requires multiple components to be changed. The aim should be to be strategic and place classes that we, from experience, know often changes together into the same component.

The classes don’t have to be physically connected (i.e. are connected through code) but can also be conceptually connected.