BizTalk Fringe: Overriding BizTalk Dehydration mechanism

On of the most annoying thing I find in orchestration design is the amount of planning one have to put in exception management.

If we consider a BPM process spanning on several weeks (implemented in a long running orchestration) we may proceed for days (and in the meanwhile completing several conversation between several systems) before noticing that some data (maybe even in the message which started the orchestration) is wrong and throws an exception.

What to do if we already contacted five external system when the sixth system rejected the data?

If we’re lucky we can compensate (rolling back) the actions already done on five systems terminating the orchestration without leaving traces (and eventually repeating the request with correct data) but this is not always possibile:

Some system may not have a way to rollback (compensate) an action (they may base this rare event resolution on manual fix procedures).
Having to rollback single parts of our actions put us in need to redo only part of our actions:
- Our orchestration should be able to start from each different point of process.
- Our orchestration should be able to skip particular process steps if necessary.
- Activation (original message) should contain an itinerary of which steps to execute and which steps skips.

EAI Patterns usually approach these problems implementing steps as atomic actions coordinated by a master process manager (Hohpe, Woolf – 2004).

Even BizTalk main BPM Scenario uses a stage separation of the process for guarantee the ability to modify an order while in execution.

In my experience this approach is often an overkill where a simple, on fly, fix in data would suffice to resume and complete successfully the process.

The most frustrating thing is that, in a similar situation, BizTalk Engine suspends the orchestration as Suspended Resumable allowing the administrator to resume it with its complete state and to retry the faulty operation but since we’re unable to act on orchestration state, the operation is doomed to fail again and again and again.

POCO Domain Model.

Having come recently in contact with DDD discipline I started to implement my orchestrations more as a set of POCO entities representing my model and less as a set of Multi part Messages as I made before (more on this in a future post).

Being the state of my orchestration represented by CLR objects and not by BizTalk messages, which are immutable, allowed me to think about “updating the state” as a way to correct possible anomalies and, why not, inspecting orchestration status without having to run the, a-bit-too-techincal, orchestration debugger.

I was not interested in accessing the data during normal orchestration execution but only when:

Orchestration was dehydrated since long time (what on the hell is it doing and what is its actual state?)
Orchestration was suspended as consequence of an error (why does it fault and it’s possible to fix its state?)

The most important thing to notice is that, in both cases, orchestration is dehydrated and this means that the whole state (comprising my POCO Domain model) is persisted on the BizTalk Database. Unfortunately the BizTalk Database is not accessible to us.

Serialization

Remember that every class used in a BizTalk orchestration must be marked with the Serializable attribute? (except if used in atomic scopes)

Well in the BizTalk FAQ is explained why it is necessary (as one can imagine … but it’s reassuring to read an official statement :) ):

The XLANGs runtime may persist to the database (dehydrate) your orchestration, including all of its data, at any point (except in the atomic scope). When the orchestration dehydrates and rehydrates, user-defined variables are binary serialized and deserialized.

So, BizTalk simply invoke the BinaryFormatter and asks our objects to serialize by themselves.

But we know that default serialization can be overridden and this will help us to externalize our domain model state.

Here I presents two ways to implement our custom serialization, both have they pro and cons as we will see soon.

Preparing the sample.

For simplicity let’s imagine that the whole state of my orchestration is contained in the simple State object reported below.

   1: [Serializable]

   2: public class State

   3: {

   4:     internal int _integer;

   5:     public int Integer

   6:     {

   7:         get

   8:         {

   9:             return _integer;

  10:         }

  11:     }

12:

  13:     internal String _text;

  14:     public String Text

  15:     {

  16:         get { return _text; }

  17:         set { _text = value; }

  18:     }

19:

  20:     internal Guid _id;

  21:     public Guid Id

  22:     {

  23:         get

  24:         {

  25:             return _id;

  26:         }

  27:     }

  28: }

It consists of 3 properties: an integer, a string and a Guid (the Guid will be initialized with OrchestrationId during state initialization)

Presenting the Sample Orchestration

The sample orchestration is very simple: it takes a message from a receive port (the message just contains an integer and a text value), initialize the State object with message data and OrchestrationId and enters in the RepeatableScope to repeate if on error.

In the scope I placed a decide shape where a simple check is made:

If Text data from the State (and therefore from the original message) is empty then an exception is raised (simulating an error) otherwise the flow exists the scope (clearing the repeat flag to avoid repeating the loop).

If an exception is raised instead (so Text in state is empty) the catch shape will first raise the repeat flag (because the loop must be repeated) and then suspends the orchestration (persisting therefore State)

When (and if) Orchestration exists the repeatable scope a new Message is created and populated with State data and sent on a send port.

Introducing our Repository

According to DDD we won’t access directly the State object, but we will use a Repository object to access it:

The Repository object will be responsible to manage the serialization/deserialization of our State object (representing our Domain Model) therefore our repository will implement just a couple of methods enabling us to set or retrieve our State:

   1: interface IRepository

   2: {

   3:     void SetState(State state);

4:

   5:     State CurrentState

   6:     { get; }

   7: }

Using ISerializable

The first IRepository implementation is based on ISerializable implementation:

We have to implement a GetObjectData that will be called by infrastructure when Repository needs to be serialized (therefore it needs to serialize the contained State) and a special constructor (that will be invoked when runtime will deserialize the Repository):

   1: public void GetObjectData(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context)

   2: {

   3:     // Here i'm going to serialize to filesystem therefore i'll use guid as a filename, but i could even serialize to a db therefore using a connection string and the guid as lookup value.

   4:     // Using a hardwired foldername just for sample sake, change it to an existing path on your filesystem or, even better, externalize it.

   5:     String filename = String.Format(@"C:\Temp\SerializationStore\{0}.txt", _state.Id);

   6:     // Serialize to the true SerializationInfo (BizTalk DB, remember that BizTalk invoked serialization).

   7:     info.AddValue("SerializedStateFileName", filename);

   8:     // Now that the file location has been saved into BizTalk proceed to save the true data.

   9:     BinaryFormatter bf = new BinaryFormatter();

  10:     FileStream fout = new FileStream(filename, FileMode.Create, FileAccess.Write);

  11:     bf.Serialize(fout, _state);

  12:     fout.Flush();

  13:     fout.Close();

  14: }

15:

  16: protected ISerializableRepository(SerializationInfo info, StreamingContext context)

  17: {

  18:     // First of all read the location of file containing persisted state.

  19:     String filename = info.GetString("SerializedStateFileName");

  20:     // Then open the existing file and deserialize the _state object.

  21:     BinaryFormatter bf = new BinaryFormatter();

  22:     FileStream fin = new FileStream(filename, FileMode.Open, FileAccess.Read);

  23:     _state = (State)bf.Deserialize(fin);

  24:     fin.Close();

25:

26:

  27: }

28:

  29: }

30:

In this code the serialization will first create a file with the OrchestrationId as name and then will use the normal BinaryFormatter on the newly created file to store it. Deserialization will operate in the reverse: first obtain the filename from serialization stream, and then using BinaryFormatter will deserialize the state object from it.

Let’s consider the following message:

   1: <ns0:Root xmlns:ns0="http://TCPSoftware.CustomDehydration.Orchestrations.Schema">

   2:   <Integer>10</Integer>

   3:   <Text></Text>

   4: </ns0:Root>

When such a message will be published on the orchestration, the orchestration will suspend in exception and, as the following screenshot shows, a file with the same OrchestrationId will pop up in the filesystem.

Now we may resume the message and it, as expected, will continue to execute the loop and resuspend in error.

But using the BinaryFormatter we can recreate and fix the State object persisted on the file outside of biztalk server, for example using the following simple console application that will simulate state correction.

   1: static void Main(string[] args)

   2: {

   3:     // args[0] is filename to fix

   4:     Console.WriteLine("Opening DomainModel from file '{0}'",args[0]);

   5:     FileStream fin = new FileStream(args[0],FileMode.Open,FileAccess.Read);

   6:     BinaryFormatter bf = new BinaryFormatter();

   7:     State state = (State)bf.Deserialize(fin);

   8:     fin.Close();

9:

  10:     // State recreated, now fixing it

  11:     if (String.IsNullOrEmpty(state.Text))

  12:     {

  13:         state.Text = "Fixed!";

  14:     }

15:

  16:     // Saving fixed state.

  17:     FileStream fout = new FileStream(args[0], FileMode.Create, FileAccess.Write);

  18:     bf.Serialize(fout,state);

  19:     fout.Flush();

  20:     fout.Close();

  21: }

After using the fixer we can resume the suspended message and this time the orchestration will complete publishing the following output message:

   1: <ns0:Root xmlns:ns0="http://TCPSoftware.CustomDehydration.Orchestrations.Schema">

   2:   <Integer>10</Integer>

   3:   <Text>Fixed!</Text>

   4: </ns0:Root>

Using OnSerialized / OnDeserializing Attributes

This approach is different from the previous one: instead of overriding the normal serialization mechanism we will side it.

   1: [OnSerializing()]

   2: public void OnSerializing(StreamingContext context)

   3: {

   4:     // Check _serializeExternally and decide if override serialization.

   5:     if (_serializeExternally)

   6:     {

   7:         // for simplicity in this sample a file is used to store data

   8:         _filename = String.Format(@"C:\Temp\SerializationStore\{0}.txt", _state.Id);

9:

  10:         // Now filename will be serialized with normal serialization inside BizTalk. Time to serialize externally state.

  11:         BinaryFormatter bf = new BinaryFormatter();

  12:         FileStream fout = new FileStream(_filename, FileMode.Create, FileAccess.Write);

  13:         bf.Serialize(fout, _state);

  14:         fout.Flush();

  15:         fout.Close();

  16:     }

17:

  18: }

19:

  20: [OnDeserialized()]

  21: public void OnDeserialized(StreamingContext context)

  22: {

  23:     // ONLY if _serializeExternally (&& ) is true then external serialization was done

  24:     if (_serializeExternally)

  25:     {

  26:         // Read from stream.

  27:         BinaryFormatter bf = new BinaryFormatter();

  28:         FileStream fin = new FileStream(_filename, FileMode.Open, FileAccess.Read);

  29:         _state = (State)bf.Deserialize(fin);

  30:         fin.Close();

  31:     }

Before running normal serialization our method marked with OnSerializing Attribute is executed: this method looks if _serializeExternally flag is raised and if it is, the method will create a file with the OrchestrationId as name and will start to serialize the state object to it using the BinaryFormatter.

When normal deserialization completes instead our custom OnDeserialized method is invoked; the method will check if _serializeExternally flag is raised and if it is (in other words if data was serialized externally before) proceed to open the file and Deserialize the state object from it.

Differently from the previous method, this approach will serialize externally ONLY when instructed to do so, in fact our orchestration needs to be modified a bit in order to use this kind of repository:

The SetInErrorFlag shape that before was only containing the code to set the boolean retry flag, now contains also the following instruction (which raise our Repository _serializeExternally flag): Repository.EnableExternalSerialization();
The ClearInErrorFlag shape similarly will now contain the following instruction: Repository.DisableExternalSerialization();

The main differences between this approach and the previous one (ISerializable) are reported below:

ISerializable	OnXXXAttributes
Data is always serialized externally	Data is serialized externally only if explicitly requested
Data is always serialized just once.	Data is always serialized inside BizTalk therefore, when also serialized externally, it has to be serialized twice.

There’s not a clear winner here: if you’re scared of persisting your data outside from BizTalk Server you may prefer the OnAttribute way (incurring in the double cost of serialization just when an exception is raised, more than acceptable), if you prefer not to expose persisting mechanism in your domain code (such as the Enable/DisableExternalSerialization methods seen above) and you trust your code enough then you’ll prefer the ISerializable way.

Simplify Workflow with Atomic Shapes

If you look at the above orchestration, chances are that you won’t like it a lot, and I definitely agree.

The point is that there’s too much “infrastructure concepts” leaking into the business process design (orchestration).

The loop shape and the surrounded Catch and Suspend shapes are infrastructure noise, placed there just to enable suspend&retry mechanism.

In fact an atomic scope has almost the behavior we are searching for because an orchestration can’t suspend in the middle of an atomic scope, and therefore if an exception is raised inside an atomic scope, the whole scope will be retried “automatically” next time the suspended instance is resumed.

Unfortunately I said “almost” because there’s a catch: the first time I tried an orchestration with an atomic scope (instead of the above suspend&retry scope) I was puzzled because even if the error was raised there was no file at all in SerializationStore folder.

Thinking about it the reason is obvious and was depicted in the previous post: there’s no persistence point “entering” an atomic scope but only “exiting” and therefore the orchestration was starting, entering the scope without persisting and raising the exception inside the atomic scope.

This means that our Repository had no time to be persisted between its creation and the exception throw and therefore it was doomed to repeat the atomic scope forever without having a chance to persist data :S

Luckily I was able to find a way to programmatically persisting the orchestration allowing the orchestration to persist just before entering the atomic scope and obtaining the whole advantages of External Serialization keeping infrastructure noise outside of my orchestration, just a quick visual comparison between the orchestration below with the one above should convince anyone…

You may download from here all the code from this article, hope you'll find it interesting.

BizTalk Fringe

giovedì, gennaio 13, 2011

Overriding BizTalk Dehydration mechanism