Friday, August 26, 2011

How much memory do you need to fit the world in it - Lessons Learned from Implementing Complex Domain Model – part 3

This one is going to be hard to accept for some people, but sometimes, just sometimes business applications don’t necessary have to use a relational database. The post is a continuation of the series based on my experiences from a project for a customer in oil&gas industry.

So here is what we knew when starting the project:

  • The usage pattern of the application is going to be the following: an administrator sets up a simulation, a limited number of users interact with the system for about a week, the results are archived (or just discarded), the cycle repeats.
  • We are going to work with a large and complex domain
  • We are going to develop iteratively and are expecting a lot of changes to the code already written (from refactoring to larger structural changes).
  • We want the software to be model based (Doman Model pattern) as we felt this was the only sensible way to tackle the complexity

A lot of people expected this was going to be backed up by a Sql Server database. Indeed, NHibernate and a relational database was one of the options we considered. Another was to use an object database (for example db4o), but we ended up doing something quite different.

First, why we have not decided to use a relational db. It just seemed that the effort required to keep the schema in synch with all the continuous changes in the structure of the model would become an overkill. Also, while I know how flexible NHibernate is and how granular the mapped object model can be, I also know it comes with the cost (custom types, exotic mappings). In addition, we did not really need a relational database. Our model was far from being relational. We feared that the mismatch would slow us down too much.

Then, we seriously considered db4o. I think that could have been a reasonable choice. Object databases seem to be pretty flexible and don’t put too much constraints on the model (they still tend to feel a bit like ORMs and rumours are they are not speed demons) but we found something even less limiting then that. Memory – yes – we have decided to keep the whole model in memory.

Of course now you asking yourself, what if the system crashes, do we loose the data, what about transactions, rollbacks, etc. What we have started with to address the above was a pretty naive approach which we later upgraded to something smarter but still very simple.

We have divided all the operations into queries and commands. Queries can access the model anytime (concurrently) and don’t need transactions (cannot change state). Commands on the other hand can only access the model sequentially (in the simple implementation) when no other command or query is executing. As soon as the command is executed (and the state of the model changed) we would serialize the whole model to a file. If the command throws for any reason we would deserialize previously saved object graph and replace potentially corrupted in-memory state. This worked quite well for a while. Before I continue, let’s look at the benefits:

  • finally we can write truly object oriented code (polymorphism, design patterns, etc.) – everything is in memory
  • finally we can utilize the power of data structures in our model (hash tables, queues, lists, trees, etc) – everything is in memory
  • it has gotten so much faster (no I/O) that you would find new bottlenecks (i.e. performance problems with our initial Quantity implementation)
  • Because we knew the simulations will last only a week or so, we could afford to just ignore schema migration strategy. If I needed to add/rename/move a class/field/property/method I just did it (no mappings, no schema update scripts)

Our naive implementation worked relatively well for a while – until our model gotten bigger and serialization no longer was instant. That affected command execution times which affected system responsiveness in general as queries waited for access to the system until a command was done, commands would pile up and disaster was unavoidable.

Now I have to confess, that when we decided to go for “everything in memory” option, we knew that the naive implementation would not take us very far, but we did it anyways. First, because we wanted to work on the model and limit the initial investment in the infrastructure. Second, we already knew how to upgrade to something more scalable – object prevalence.

The basic idea is still the same. Keep the model in memory. Have the queries access the model concurrently, and only allow changes to the model trough the commands. The difference is that instead of taking a snapshot of the whole graph after each command, you only serialize the command itself to a “command log”. Later if you need to restore the state of the system (after power failure?) you just “replay” all the commands from the log file. You still may want to take full snapshots every now and then and use them as starting points for the system recovery (just replay the commands executed/logged since the last snapshot).

This by no means was our invention. The above is an implementation of Event Sourcing pattern. Also the term object prevalence and an implementation of the event sourcing pattern as a persistence mechanism was done by the people behind Prevalayer for Java (their site seems to be down as I write this but here is a webarchive version of the FAQ). I’m not sure if this project has been discontinued (the last commit to the git repository was in 2009). Unfortunately, .NET port called BambooPrevalence does not seem to be maintained anymore neither. Initially we were reluctant to base our solution on a library not supported or maintained by anyone, but the idea behind prevalence is so simple, that we have decided, that if needed, we will be able to fix problems ourselves. We have based our code on slightly customized version of BambooPrevalence and have not had any problems related to it.

End of part 3

Using object prevalence was the best decision we have made in the whole project. To be honest I cannot image us finishing the project on time if we did not have all the freedom and flexibility of the in-memory model. I’m not sure I can recommend this approach to everyone but the ideas behind it become more and more popular in the form of CQRS. Also, not so long ago Martin Fowler published an article on The LMAX Architecture where similar approach worked extremely well in a retail financial trading platform (keeping everything in memory, single update thread, extreme throughput).

No comments: