Saturday, September 10, 2011

Scaling images for web–part 2

After reading the last part, looking at Image Transformations page in Open Waves documentation section, we know how to transform an image. All we need is an input stream and an output stream.
Where do we get an input stream from? If we have a virtual path of an image, we can try something along the following lines:

public Stream OpenImageStream(string virtualPath)
{
   return File.OpenRead(HostingEnvironment.MapPath(virtualPath));
}

But this is not how ASP.NET does it. A very useful feature introduced in ASP.NET 2.0 is an abstraction layer on top of a file system – VirtualPathProvider. Ok, so it makes sense to use it to access files. Our code to open image stream can look like this:

public Stream OpenImageStream(string virtualPath)
{
  var file = HostingEnvironment.VirtualPathProvider.GetFile(virtualPath);
  return file.Open();
}

What was wrong with the first approach? If you develop EPiServer sites you know that the file pointed to by a virtual path does not necessary is on the local disk. With projects like VPP for Amazon S3, the file might not even be in our network. One problem with abstractions is that sometimes they hide just a little bit too much. In our case, if we had a file on a local disk, we could for example check its last modified date to see if we need to transform the image or can serve a cached version. Fortunately, ASP.NET developers have also noticed this need and provided a way for implementers of VPPs to notify clients of file changes. That’s why a VirtualPathProvider class has the following methods:

public virtual string GetFileHash(
    string virtualPath, 
    IEnumerable virtualPathDependencies)


public virtual CacheDependency GetCacheDependency(
    string virtualPath, 
    IEnumerable virtualPathDependencies, 
    DateTime utcStart)

The first one should return a stable hash of a file – if the file hash changes, we can assume the file has changed.

The second one should return a CacheDependency that will invalidate cache entry (if a client decides to cache results of the file processing) when the file changes.

VirtualPathDependencies parameter is an important one. When calling the methods, it should contain a list of all dependencies of the given virtual path. If any of the dependencies change, the provider should consider the file indicated by the virtual path as modified, trigger cache dependency, and update file hash. When using the methods (or implementing a VPP) remember that the virtual path must be included in the list of dependencies. Example:

Let’s say we have foo.aspx file that includes a reference to bar.ascx control. ASP.NET BuildManager will ask for file hash using the following method call:

virtualPathProvider.GetFileHash(
   "~/foo.aspx",
   new [] {"~/foo.aspx", "~/bar.ascx"})

In image scaling scenario, where image files don’t have any dependencies, the dependencies list will only include a virtual path to the file itself.

A word of warning. Not every implementation of a VirtualPathProvider will implement the above methods. It is fine not to implement one or both of them. In such case, the base class will return null for both file hash and cache dependency. EPiServer Unified File provider (and derived classes) is an example of the implementation where GetFileHash method is not present (GetCacheDependency is implemented by VirtualPathNativeProvider). For cases like this, if you know the details of the VPP implementation (DotPeek?) you can often find other ways to calculate the hash. In EPiServer case VPP.GetFile method returns instances derived from VirtualFileEx which has Changed property, giving us access to the last modified date of a file.

For scenarios like this, and for better testability, image transformation related code in Open Waves does not depend on a VPP directly. Instead we have IVirtualFileProvider interface.

    public interface IVirtualFileProvider
    {
        IVirtualFile GetFile(Url fileUrl);        
    }

    public interface IVirtualFile
    {
        Url Url { get; }
        string Hash { get; }
        Stream Open();
    }

The interface is implemented by VirtualPathFileProvider class which is a wrapper for a VirtualPathProvider. This gives us a chance to “fix” any issues we may find in the underlying VPP. Another difference is that we are not relying on virtual paths but rather on Urls (in most cases they will be virtual paths). This allows us to implement a virtual file provider that fetches images from external sites (flickr?) – just need to be smart about how to implement file hashes. For more details take a look at this page from Open Waves documentation section Image Transformations for Web.

In the next part I’ll try to describe an approach we chosen to caching transformed images.

Tuesday, September 06, 2011

Scaling images for web

 

Today I am going to divert a bit from the topic of my domain models series and try to write about what I am currently working on. I hope to continue the series in the following posts.

For the past 2 days I have been working on migrating a code responsible for image transformations (scaling to be precise) from the MakingWaves.Common library to Open Waves repository. Whenever I move stuff to Open Waves, I try to improve what we have and implement ideas we’ve had for a given feature but never got time to code them. Before I talk about the code itself, let’s see what problems need to be solved.

Resizing

Ok, so we have an image file (or a stream) and want to generate a resized version of the image.

System.Drawing (GDI+)

System.Drawing namespace has been around since the first version of the framework. It is a set of wrappers around GDI+ API.It has been used in many web application even thought, the documentation clearly says it is not supported for use in ASP.NET. The fact is, it works and works reasonably well. There are things to remember though. First, be very careful to dispose anything that should be disposed. Second, to achieve good results (image quality and performance) one needs to remember to set a couple of properties to just the right values. My favourite is imageAttributes.SetWrapMode(WrapMode.TileFlipXY) to avoid “ghosting” (1px frame) around the result image.

See this article for more details and this one for comparison of quality vs. performance for different settings.

WPF

It may be suppressing, but it is possible to use WPF to resize images in the server environment. Again, an article describing the details from Bertrand Le Roy. The performance is much better compared to GDI+, and the quality does not suffer. Two problems though: works only in full trust, and (according to Mr. Le Roy) it is again not supported in server scenarios (I could not find anything in MSDN to confirm this).

By the way, this is the method that EPiServer.ImageLibrary uses to process images for image editor in edit mode. So, if you want to use this method, don’t want to code it yourself and are working on an EPiServer site, go and read this entry from Mattias Lövström. The only problem is, that the API the component exposes makes it hard to maintain the aspect ratio of an image. Basically, they will let you scale the image to a given width and height, but first you will need to figure out the correct width/height ratio. I guess, when used in image editor that’s fine as the client-side code keeps a fixed aspect ratio when resizing an image, but when what you get is a random picture to resize, this becomes a problem.

The bad news is, that if you try to use System.Drawing.Image to first load an image, inspect Width and Height properties to compute the ratio, you’ll end up decoding the file which is a pretty heavy operation. It is possible that whatever you can gain by using WPF transformations you will lost by unnecessarily decoding the image. The good news is, that if you use WPF API to do the same, it will only load image metadata and will not decode the image (it is lazy in that matter).

WIC (Windows Imaging Component)

It is not a surprise that WPF imaging API is a wrapper for a piece of native code. This native code is called WIC. here is a chapter from MSDN showing the usage of the component. To use it from .NET you will need a managed wrapper. Once again I will send you to Tales from the Evil Empire blog for details on how to use the components. There you will also find a link to a ready to use managed wrapper.

Resizing summary

In theory, the only truly supported way is to use WIC directly (even though WPF does pretty much the same thing). In practice components from both System.Drawing (GDI) and System.Windows.Media (WPF) namespaces seem to work reliably on the server. System.Drawing has an advantage of working in a medium trust environment. At Making Waves, for quite a while, we have successfully used GDI for our resizing needs, but I figured, that since I am working on migrating this part of our frameworks I may as well implement the other two mechanisms and add support for plugging in EPiServer.ImageLibrary. Note: We use the same set of resizing components in non EPiServer projects, hence we need more then just EPiServer.ImageLibrary to cover our needs.

Transformations

In practice, in most cases, you will need transformations that maintain the aspect ratio of the original image. We use one of the following:

Scale to fit – resizes an image so it fits specified rectangle (width/height) without clipping. For example when fitting into a square, landscape images will be sized so their width matches the width of the square, while portrait images will be sized so their height matches the height of the square. This is the most often used transformation (galleries, thumbnails, user pictures, or whenever you want to avoid clipping the image).

Scale down to fit – same as the above but will only resize an image if it is larger then specified rectangle (width/height). This is for example useful if you want to scale images for display on mobile device, where any image wider then the width of the screen gets scaled down, but the ones that fit the display are not transformed.

Scale to fill – resizes an image so it fills the whole rectangle (width/height). Ultimately the image is scaled so it is large enough to cover the target area and centrally cropped to match the specified dimensions. Useful when a graphics design assumes that the picture covers the whole region of a page.

Other transformations that are not very popular but may be useful once in a while are stretch and crop.

End of part 1

I have not planned this but it appears this one is going to be a first post in the series about scaling images for web. Things I want to cover in the next post include:

  • Versioning of original images
  • Methods for serving transformed images
  • Caching strategies
  • OpenWaves.Web.Image control

A lot of the things I discussed here has already been implemented and is available in Open Waves repository. Even though this is still work in progress, I will be happy to hear any comments you might have about the code.

Friday, August 26, 2011

How much memory do you need to fit the world in it - Lessons Learned from Implementing Complex Domain Model – part 3

This one is going to be hard to accept for some people, but sometimes, just sometimes business applications don’t necessary have to use a relational database. The post is a continuation of the series based on my experiences from a project for a customer in oil&gas industry.

So here is what we knew when starting the project:

  • The usage pattern of the application is going to be the following: an administrator sets up a simulation, a limited number of users interact with the system for about a week, the results are archived (or just discarded), the cycle repeats.
  • We are going to work with a large and complex domain
  • We are going to develop iteratively and are expecting a lot of changes to the code already written (from refactoring to larger structural changes).
  • We want the software to be model based (Doman Model pattern) as we felt this was the only sensible way to tackle the complexity

A lot of people expected this was going to be backed up by a Sql Server database. Indeed, NHibernate and a relational database was one of the options we considered. Another was to use an object database (for example db4o), but we ended up doing something quite different.

First, why we have not decided to use a relational db. It just seemed that the effort required to keep the schema in synch with all the continuous changes in the structure of the model would become an overkill. Also, while I know how flexible NHibernate is and how granular the mapped object model can be, I also know it comes with the cost (custom types, exotic mappings). In addition, we did not really need a relational database. Our model was far from being relational. We feared that the mismatch would slow us down too much.

Then, we seriously considered db4o. I think that could have been a reasonable choice. Object databases seem to be pretty flexible and don’t put too much constraints on the model (they still tend to feel a bit like ORMs and rumours are they are not speed demons) but we found something even less limiting then that. Memory – yes – we have decided to keep the whole model in memory.

Of course now you asking yourself, what if the system crashes, do we loose the data, what about transactions, rollbacks, etc. What we have started with to address the above was a pretty naive approach which we later upgraded to something smarter but still very simple.

We have divided all the operations into queries and commands. Queries can access the model anytime (concurrently) and don’t need transactions (cannot change state). Commands on the other hand can only access the model sequentially (in the simple implementation) when no other command or query is executing. As soon as the command is executed (and the state of the model changed) we would serialize the whole model to a file. If the command throws for any reason we would deserialize previously saved object graph and replace potentially corrupted in-memory state. This worked quite well for a while. Before I continue, let’s look at the benefits:

  • finally we can write truly object oriented code (polymorphism, design patterns, etc.) – everything is in memory
  • finally we can utilize the power of data structures in our model (hash tables, queues, lists, trees, etc) – everything is in memory
  • it has gotten so much faster (no I/O) that you would find new bottlenecks (i.e. performance problems with our initial Quantity implementation)
  • Because we knew the simulations will last only a week or so, we could afford to just ignore schema migration strategy. If I needed to add/rename/move a class/field/property/method I just did it (no mappings, no schema update scripts)

Our naive implementation worked relatively well for a while – until our model gotten bigger and serialization no longer was instant. That affected command execution times which affected system responsiveness in general as queries waited for access to the system until a command was done, commands would pile up and disaster was unavoidable.

Now I have to confess, that when we decided to go for “everything in memory” option, we knew that the naive implementation would not take us very far, but we did it anyways. First, because we wanted to work on the model and limit the initial investment in the infrastructure. Second, we already knew how to upgrade to something more scalable – object prevalence.

The basic idea is still the same. Keep the model in memory. Have the queries access the model concurrently, and only allow changes to the model trough the commands. The difference is that instead of taking a snapshot of the whole graph after each command, you only serialize the command itself to a “command log”. Later if you need to restore the state of the system (after power failure?) you just “replay” all the commands from the log file. You still may want to take full snapshots every now and then and use them as starting points for the system recovery (just replay the commands executed/logged since the last snapshot).

This by no means was our invention. The above is an implementation of Event Sourcing pattern. Also the term object prevalence and an implementation of the event sourcing pattern as a persistence mechanism was done by the people behind Prevalayer for Java (their site seems to be down as I write this but here is a webarchive version of the FAQ). I’m not sure if this project has been discontinued (the last commit to the git repository was in 2009). Unfortunately, .NET port called BambooPrevalence does not seem to be maintained anymore neither. Initially we were reluctant to base our solution on a library not supported or maintained by anyone, but the idea behind prevalence is so simple, that we have decided, that if needed, we will be able to fix problems ourselves. We have based our code on slightly customized version of BambooPrevalence and have not had any problems related to it.

End of part 3

Using object prevalence was the best decision we have made in the whole project. To be honest I cannot image us finishing the project on time if we did not have all the freedom and flexibility of the in-memory model. I’m not sure I can recommend this approach to everyone but the ideas behind it become more and more popular in the form of CQRS. Also, not so long ago Martin Fowler published an article on The LMAX Architecture where similar approach worked extremely well in a retail financial trading platform (keeping everything in memory, single update thread, extreme throughput).

Sunday, August 21, 2011

Quantity Pattern - Lessons Learned from Implementing Complex Domain Model – part 2

 

Analysis Patterns: Reusable Object Models is a book by Martin Fowler, first published in 1996. It is not new nor it is an easy reading. As much as I like Fowler’s style of writing I struggled trough some of the chapters. Do I recommend the book – yes. Why? Because each time I am involved in a project with complexity above the average, dealing with real world business cases, I find the patterns described in the book helpful.

The project I mentioned in part 1 is all about real world business cases and it is a serious business – oil&gas business. In this post I will focus on Quantity Pattern that played a key role in a success of the project.

In our domain we had to deal with values expressed using various units and their derivatives. The formulas used for calculations were pretty complex and involved unit arithmetic. For example you want to be sure that calculated daily oil production rate is expressed in  mbbl/day (thousand oil barrels per day) or any unit that can be converted to OilVolume/Duration.

Quantity pattern tells you to explicitly declare a unit for every dimensioned value instead of just assuming the unit (representing the value as a bare number). This part is easy:

public class Quantity
{
public decimal Amount { get; private set; }
public Unit Unit { get; private set; }
}


Next there is Parsing quantities and units from strings and converting them to strings. This is not the hardest part but there are things you have to watch for. There are prefix units ($) and suffix units (most of them). Some of them require space after the amount, some look better concatenated with the amount. Examples could be: “$10.5m” and “42 km”.



The hardest part was implementing arithmetic on Quantities with support for all operators, compound units, conversions, reductions, etc. But it was worth it. Now we can write code like this:



var oilVolume = Quantity.Parse("100 mmbbl");
var duration = new Quantity(1, Units.Year);

var productionRate = (oilVolume/duration)
.ConvertTo(Unit.ThousandOilBarrels/Unit.Day);

Console.WriteLine(productionRate.Round(2)); // Gives us "273.97 mbbl/d"


When we thought we were done with the implementation, we had discovered that the performance of Quantity*Quantity is far worse then decimal*decimal. Profiling showed that operations on units (mostly reductions and conversions) caused Unit.Equals method to be called so many times (especially when looking for conversions between compound units) that despite the fact that a single Unit.Equals execution would take 1 ms the final result was not acceptable. We were crushed. Of course the first thought was to go back to using decimals, but we really did not want to give up all the Quantity goodness.



It took us a while to come up with the solution that depended on making sure we only ever have a single instance of any unit. That allowed us to compare any two units (including compound units) for equality using Object.ReferenceEquals.



This was easy for base units – we just made them singletons, i.e.:



public class Unit
{
public static readonly Unit OilBarrel = new BaseUnit("bbl", ... );

// ...
}


All other units were the problem. There are many ways one can create an instance of an unit, some examples:



var a = Unit.Parse("100 mbbl/d");

var b = (Unit)BinarySerializer.Deserialize(BinarySerializer.Serialize(a));

var c = Unit.ThousandOilBarrels/Unit.Day;


At the end we covered all of them using lookups, unit operation cache, conversion cache, implementing IObjectReference and such. The result was surprisingly good. We were able to achieve performance close enough to this of operations on pure decimals (after all caches got populated). What made us really happy was the fact, that we were able to solve the performance problem just by changing the implementation of the Unit and Quantity classes. Public interfaces used by all the code already written was unchanged.



The summary is that if you’re to work with a domain that deals with a lot of dimensioned values using a number of base units and their derivatives, implementing even the simplest form of Quantity pattern will make your life much easier. Implementing the full featured Quantity class will take time and depending on your performance requirements may or may not be worth it.



End of part 2



I strongly recommend reading Analysis Patterns. Even if you don’t remember the details of the patterns after the first pass, don’t worry – just keep the book around – you will read it again when the time comes. Other patterns from the book we used in the project were many of the Accounting Patterns.

Saturday, August 20, 2011

Lessons Learned from Implementing Complex Domain Model – part 1

It took us over a year to implement this application for a company in oil&gas sector. Customer seems happy, we are happy – let’s call it a success. It was Silverlight 4 front-end connecting to WCF services in the back-end. We have encountered quite a few technological challenges but the biggest challenge was the domain that none of us has ever worked with.

We had to learn a lot about licences, bidding, drilling, platform development, oil production, pipelines, gas contracts, taxation regimes, accounting, corporate finance… I could continue for a while, but the bottom line is that it is complex stuff. Now after a year I can say that the model we have built is not perfect and I would gladly rewrite some parts but overall I think we did pretty well.

Whenever anyone says Domain Model everyone thinks of DDD. I cannot say we have followed DDD by the (blue) book, but lets say we were inspired by the ideas.

Ubiquitous language and knowledge crunching

This part worked really well. The scope of the application and complexity of the domain made it hard to just explain (or rather understand) how it all should work. To get us started we would talk to the customer in a room with a whiteboard and draw UML-like diagrams trying to illustrate a story told by an expert. BTW - we were lucky - the customer was a real expert (infinite source of domain knowledge). We initially wanted to redraw the UMLs using a computer tool, but we only ended up taking pictures. At the end we have not even used the pictures too much. The diagrams were just very helpful to organize the information as we received it. They allowed us to create high level mental models of the domain and learn the vocabulary used by the experts.

An interesting bit is that we ourselves have created a lot of concepts that the experts found useful when describing the functionality. In some cases they were just names for things they don’t even care to name. In other cases we needed more precise names to avoid ambiguity. When creating new concepts be very careful. If an expert feels the concept is artificial and the word does not describe anything they are familiar with, it is probably not what you were looking for.

Most of the concepts we put in the diagrams ended up as C# classes. We have created many classes we have not anticipated when drawing the diagrams, but the high-level structure of classes was pretty close to what we discussed with the customer. Of course we did not get it right the first time and had to change the structure many times as we discussed new features. The good thing was that when discussing changes we all used the words that meant the same thing to the expert and us – developers.

Now you may wonder if it is possible to teach the customer enough UML to be able to use it as a communication tool. My experience shows that in most cases it is. At the end, you will need a smart person as an expert to be able to learn from them. If they are smart, they will understand that a rectangle with a name represents a concept. Then use a few examples like Pet, Dog, Cat or Vehicle, Car, Bus to illustrate generalization and specialization. Associations are also easy to explain (Car “has” wheels, Pet “has” Owner, etc). Just don’t try to explain composition vs. aggregation. A simple arrow is enough at the level of details you want to capture at this stage.

End of part 1

In the next parts (if I ever write them) I want tell you about:

  • why Analysis Patterns is a book worth reading,
  • why the customer was suppressed they will not need SQL Server licence,
  • why you need to be very careful when deciding to use State design pattern
  • why contracts are the best thing since sliced bread

Wednesday, July 13, 2011

EPiServer, OpenWaves, and PageDataAdapters

This is a first post after a long break. Considering my lack of discipline and time I’m pretty sure this will not turn into regular blogging but anyways.
Some context first. It’s been 4 years since I started working for Making Waves. One of the thinks we do is implementing EPiServer CMS. If you have never heard of it, don’t read further – this may bore you.
Still with me? Good. From now on I’ll assume you know what EPiServer is and that you are a developer.

Open Waves

Making Waves does a lot of projects. As you can imagine a lot of code written for one project very often can be reused in another. For some time, we have been trying to extract anything that seemed reusable into a library that we internally call MakingCommon. Recently, we begun to open source pieces of the library under the name Open Waves. With time, we hope to move most of the internal frameworks to codeplex and share them with others.

PageData Adapters

An interesting functionality (only if you implement EPiServer based solutions, and it still waits for its turn to be moved to Open Waves) is provided by PageDataAdapters. PageDataAdapters started as a mini project of mine 2 years ago but has been developed by couple of other Wave Makers since then. The inspiration for it came from Castle Dictionary Adapters, but feature-set was influenced by patterns employed by model oriented methods, OR mappers, and AOP paradigm. The underlying assumption is that when implementing a CMS 95% of the code is just reading data entered by editors and this is where PageDataAdapters provide most of the functionality. As soon as you see the first example you will think to yourself: “This is the same thing as PageTypeBuilder…”, and in a way you will be right. But it will be a surprise to learn that initially we have not even planned to generate page types from the classes (but now we do). Now some of the usage examples (if you are familiar with what PageTypeBuilder does you will have no problem understanding them).
[PageTypeDefinition]
public abstract class Article
{
    [Property(Required = true)]
    public abstract string Title { get; }

    [XhtmlProperty(DefaultValue = "")]
    public abstract string Body { get; }

    [Parent]
    public abstract ArticleCategory Category { get; }

    [PreviousSibling]
    public abstract Article PreviousArticle { get; }

    [NextSibling]
    public abstract Article NextArticle { get; }

    public abstract IEnumerable<Article> RelatedArticles { get; }
}
As you can see, to define a model for an article page we don’t have to use any of EPiServer related classes. It is pretty clean and self explanatory. We can use model classes to define properties of the class. We can use attributes to customize how values of the properties will be resolved in runtime and how they will be generated in the page type. Here is another example showing how article category can be modelled.
[PageTypeDefinition]
[AllowedChildrenPageTypes(typeof(Article))]
public abstract class ArticleCategory
{
    [Property(BuiltInProperties.PageName)]
    public abstract string Name { get; }

    public abstract Person ContactPerson { get; }

    [Children]
    public abstract IEnumerable<Article> Articles { get; }
}
Now after the model is defined we can use it in the templates. Thanks to richness of the model templates can be much simpler and cleaner. All we have to do is derive the template from generic TemplatePage<TModel> to get access to Model property of type TModel.
public partial class ArticleTemplate : TemplatePage<Article>
{
    ...
}
<h1><%: Model.Title %></h1>
<div>
<%= Model.Body %>
</div>
Contact person: <%: Model.Category.ContactPerson.Name %>
I hope this very simple example illustrates how easy it is to traverse page tree without a need to use any of EPiServer infrastructure in the template code or markup.

An important aspect of the framework is that it is extensible in many ways and places. None of the attributes used in the examples is special or expected by the framework. All of them implement interfaces that extend the runtime behaviour or affect page type generation. For example this is a source code for the ParentAttribute class.
[AttributeUsage(AttributeTargets.Property)]
public class ParentAttribute : Attribute, IPropertyValueResolver
{
    public TypedValue ResolveValue(PageData pageData, string propertyName)
    {
        return TypedValue.From(pageData.ParentLink);
    }
}
IPropertyValueResolver interface is responsible for resolving the value of a property. In most cases resolution means getting the value from PageData, but in this case we’re returning a PageReference to the parent. Now, since what we return is a PageReference, how come Article.Category can be of type ArticleCategory. This is possible thanks to automatic conversions that the framework tries to apply looking at the type of the value returned by a resolver and type of the property (PageReference –> ArticleCategory in the example).
Built in conversions include among others:
  • PageReference –> PageData
  • PageReference –> Url
  • LinkItem –> PageReference
  • LinkItem –> PageData
  • int –> Enum
Creating custom property value resolvers and property value converters allows us in most cases to abstract away EPiServer internals and lets us focus on creating clean and easy to understand information models for the sites we develop.

Future


The future of PageDataAdapters is not quite clear. We have noticed how PageTypeBuilder grew to be a de facto standard library used in EPiServer projects and we know that parts of our framework duplicate functionality of PageTypeBuilder. At the same time we really like the abstraction on top of PageData provided by PageDataAdapters. Also, we have quite a few live projects using the library so we cannot just kill it even if we wanted :P

For now we’ve decided that we want to open source it with the rest of the reusable code we maintain internally. I’ll be very happy to hear any comments you may have about the above and I do encourage you to check Open Waves (even though we have barley started the migration to codeplex).