Testing in a BizTalk project is always a tricky task principally due to the confusion and term overlapping which IMHO accompany it.
This confusion can be traced back to three main reasons:
- The most famous testing framework for BizTalk Server has been erroneously called BizUnit.
If a developer approaches it expecting to find the BizTalk equivalent of what JUnit or NUnit are for Java or C# he’ll be quickly disappointed and the risk is that this developer will convince himself that “unfortunately Unit Test is something you can’t do on BizTalk server”.
The crucial point here is to understand that BizTalk supports Unit Testing (at least BizTalk 2009…) but BizUnit is not a Unit Testing framework but a Functional (or Integration) Test Framework (more on this on the second point below…) - BizTalk is an integration platform… this, from testing prospective, means that “functional tests” and “integration tests” are often overlapped, because “the function of biztalk is to integrate”.
Keeping, as more as possible, these two levels decoupled, simplify testing management. - Test Data Quality is essential: BizTalk inputs and outputs are complex and structured messages, and these messages are generated by systems outside of our control.
Therefore, you should stress external system teams to share with you unambiguous contracts for messages that will be interchanged between platforms (being WSDL, XSD, a set of concrete message instances and so on…).
In other words, using a lame example, knowing that a field expected by an external system contains a boolean is not enough: in fact we could test it using “true” or “false” strings but the final system may expect “0” or “1” integers. In this situation the error won’t be highlighted by our tests and therefore we could have a report with all green light but integration with the actual system is going to fail miserably.
Let’s analyze test classification i usually do for BizTalk projects:
UNIT TEST
These tests are usually realized in C# (using MSTest or NUnit) and their goal is to test single solution artifacts in isolation.
In another post we’ll analyze unit testing for each BTS artifact but for now it will be enough to underline that BizTalk 2009 enables unit testing for maps, schemas and pipelines (even if, personally, i still prefer winterdom pipeline tester for pipelines testing).
Orchestrations could be tested using BizMonade project (a really interesting approach to the problem albeit i don’t know if this approach can reflect real BizTalk behavior)
Using these framework, we can do true unit testing for biztalk, having therefore the same advantages (speed, simplicity, integration in IDE and continuous integration servers) which made unit testing a requirement for every modern software project.
FUNCTIONAL TEST
These tests aims to show that the whole biztalk infrastructure being developed is working as expected.
All components are deployed in a biztalk test environment and, using premade input and output messages, the solution is tested with a black box approach: input messages are fed into BizTalk Server and output messages (along with other infrastructure components such as eventviewer, application databases, …) are examined to validate biztalk behaviour.
These tests are more complex to implement compared to unit tests and require more setup time (you have to mock external systems, to deploy infrastructure in a true biztalk environment, to configure probes used to check correct biztalk behaviour and you need to drive the whole test managing ports, orchestrations and receive locations status)
As underlined before, these kind of tests are the ones in which BizUnit excels.
Functional tests can be automated (there’s no need of external systems) but obvioulsy, since they require the deployment to a BizTalk infrastructure, we first need to put in place a way to automatically deploy a BizTalk infrastructure.
INTEGRATION TEST
From BizTalk point of view, these tests are not different from Functional Tests: you’ve to deploy the infrastructure in a biztalk environment, you’ve to feed biztalk with predefined input messages, and you have to check biztalk outputs against predefined output messages, checking therefore preconditions and postconditions for each defined flow under testing.
What changes is outside BizTalk: in functional test external systems are mocked, while in integration tests biztalk is interacting with actual external systems (albeit in test environment).
Tipically these tests are scheduled in the project plan and their execution follows or overlaps the end of developments (i consider also UAT as integration tests, because, from the BizTalk POV in both tests we’ve to interact with an external system, being it another application as in canonical IT or a human operator in UAT).
Obviously these tests cannot be automated because they needs other systems involvment and therefore every test execution must be coordinated with them.
REGRESSION TEST
These tests are simply unit and/or functional tests but are called differently to underline when they’re executed in the project life cycle.
Sometimes a bugfix (or a change request) appear after the system is gone in production.
If this happens, obviously, the test suite did not contain a test for that specific behaviour (because the bug was not catched before if we’re considering a bugfix or because the new behaviour was not under test if we’re considering a change request).
So new tests must be added to the suite to take care of new requirements or constraints, but the modified infrastructure, obviously, doesn’t have to pass just new tests, it needs to be tested again through the whole test suite.
This process is called passing regression tests.
PERFORMANCE TEST
Tecnichally performance test are simply a subset of functional tests: what changes are the volume of data managed.
Generally there are two kind of performance test one should always consider in biztalk solutions:
- Testing biztalk solution against few huge messages.
- Testing biztalk solution against highly frequent small messages.
Performance tests should be run in a highly instrumented environment, this means:
- Closing all unnecessary processes on target machine (or, if not possibile, processes active at test time should be noted).
- Activating all main performance counters for BizTalk (Throttling Status, DB Queue Length, …) and/or test machine (CPU, Memory usage, …).
The main tool used during performance tests is LoadGen 2007 which allows to simulate necessary loads against the biztalk target environment.
Here let me underline a couple of things about performance tests:
DON’T USE performance test results to make quantitative previsions about biztalk performance in production environments.
The only situation in which i would trust of similar prediction is when the testing environment is EQUAL to production environment (and i’m not even sure of that…)
Today’s computers are extremely complex beasts: several hardware manifacturer, several drivers, several OS indirection layers, multi core CPUs, virtual enviroments…
Assuming that “since the test environment took x to process a message” production environment will process the same message in x/2” because production “has roughly twice RAM and CPUs” is foolish.
You could say “it will perform better on production environment” but again, who knows if in production a network card will conflict with i don’t know which obscure driver making the whole environment underperform?
Obviously we can’t simply avoid the question “how well the system will perform in production” but we must have clear that there are tons of variable not depending from us and therefore it’s better not to exagerate in predictions.
Instead USE performance test to make qualitative previsions, where with qualitative i intend something like: “when you fed in biztalk that 1GB XML message did it exploded? or didi it correctly processed it after a while?”.
If our infrastructure was able to process and sustain performance test loads without crashing or malfunctioning we can reasonably assume that productuon environment will be able to process these loads too.
USE performance test also to make differential conclusions, where with differential i intend to compare every optimization before commit it to the main project branch:
Let’s say you’ve an idea to speed up biztalk processing by changing one component.
With a performance test infrastructure in place you can test infrastructure behaviour without the optimization and with the optimized component comparing EVERY performance counter and not just the one you’re trying to optimize.
This is because every optimization is really speaking a trade-off between one resource and another one.
Forgive me for the following silly example but i think it makes the point:
Let’s say you keep configuration data inside an external database, one obvious optimization is to put some caching layer between your application and the database; therefore you don’t have to contact database, establish a connection and execute a query every time you need configuration data, with caching you just retrive cached data from local memory.
Well, even in this example we have obtained something (a quicker application, less network traffic, etc) but we had to pay something (memory occupation is increased because cached data must be kept in local RAM).
Obviously, in the cache example the exchange was profitable: we paid 1 and we got 100 but we have to keep in mind the trade-off when optimizing and having a full set of performance counters ready to be evalued as performance test requires help us in identify if an optimization tradeoff is valuable or not.