Orchestration best practices_Microsoft BizTalk Server 2010 Patterns-QQ阅读中文轻小说网

上QQ阅读APP看书，第一时间看更新

Orchestration best practices

Orchestration is the technology provided by BizTalk to allow us to graphically model business processes and complex messaging scenarios. Orchestration is a powerful tool that allows us to build expressive solutions that can help us overcome coupling inherent in the systems we are connecting. It is also the tool we generally use for service composition.

Orchestration is a close relative to sequential workflow in Workflow Foundation (WF). In fact, the same team that built WF built BizTalk's orchestration engine; orchestration even predates WF. The two have many similarities and some distinct differences. The biggest difference being that in orchestration, like in all of BizTalk, messages are immutable; meaning that once assigned, their values cannot be changed. The two share the concept of dehydration—the saving of state—so that the workflow can be removed from memory. This is vital to the scalability and reliability of both, but in orchestration we do not directly control when dehydration occurs. Orchestration also has a concept of persistence related to dehydration.

To create an orchestration, we sequentially model the steps of our business process in the orchestration designer within Visual Studio. The toolbox provided to create orchestration resembles a flow chart from Visio, but it also includes more advanced concepts for constructs than most flow charts cover. These include delays, exception handling, compensation, branching, and role party links.

When we design orchestrations, they are actually stored as XML, much like WF, but are then compiled into a language of their own, X# (X Sharp), which looks very much like C#, but is a distinct language. The orchestration engine in BizTalk is called XLANGs; hence the X# language. In addition to a simple graphical flow, orchestration provides us with durability in our business processes thanks to the previously mentioned dehydration and persistence.

Developers and analysts tend to favor orchestration because it gives them a clear graphical representation of the process they are modeling. As a result, orchestration tends to be over utilized or not used optimally in BizTalk solutions. The following recommendations relate to best practices in the use of orchestration within BizTalk solutions.

Avoid overuse of orchestration

Orchestrations are a great tool in the BizTalk toolkit, but they come at a considerable cost. Many of these costs manifest themselves as round trips to the message box, which means crossing a process boundary and writing to and reading from a database; the message box. This is an expensive operation and one that should only be done when it is necessary. Early in their experiences with BizTalk, developers often latch onto orchestration because of its apparent simplicity and graphical design. Please try to resist this urge. You can do this on your development machine just to get comfortable, but for your first project, make your life easier and avoid orchestration for simple operations.

When I train developers on BizTalk, we wait until the fourth or fifth lesson to cover orchestration at all, which this book does as well. I strongly believe it is important to cover the fundamentals of messaging, mapping, and testing before one begins to create orchestrations.

The following figure shows a situation where orchestration was not the right solution for the problem:

All this orchestration does is receive a message, map it, and forward it on to a send port. This can all be accomplished using a messaging only approach. The following steps explain how it would work:

On the send port, where you want to send this message, go to the Filters page and add a filter similar to the following: BTS.ReceivePortName == <<Name of Receive Port>>.
While still on the send port, change to the Outbound maps page and assign the map to use for transforming this message.

If your schemas and map already exist, these two steps can be completely accomplished within the BizTalk Administration console. The two solutions are functionally equivalent, so let's take a look at the runtime steps involved for each; in particular how they interact with the message box. Often in BizTalk, trips through the message box are called message box hops. These are expensive operations and should be minimized.

In the orchestration solution, the steps executed by the runtime are as follows:

Receive message via adapter and write to message box.
Retrieve work item from message box and start orchestration.
Write new mapped message to message box.
Retrieve work item from message box for send port.

Each of the previous four steps requires an interaction with the message box. If we now look at the messaging only approach, the steps are as follows:

Receive message via adapter and write to message box.
Retrieve work item from message box for send port.

Clearly, the second approach has half the trips to the message box. Your performance realistically will be twice as good in the second solution as the first. When you deal with high throughput or low latency solutions, these types of savings are critical. Even when you don't have strict performance requirements, remember that BizTalk is a global platform. Just because one solution does not have high performance requirements is not to say that another, running in the same group, will not. The aggregate demand of many poorly implemented low volume applications can adversely impact a BizTalk environment.

Always use multipart messages in orchestrations

Although most messages used in BizTalk solutions are not multipart, in the sense that they only have one body, using multipart messages in an orchestration provides the benefit of isolating schema changes from impacting orchestrations. In an orchestration, we must define messages at the orchestration level and these messages can be either schemas, .NET classes, multipart messages, or web message types (which makes no sense to me, as this represents tight coupling considering that, by their nature, they are external). The BizTalk IDE is trying to be our friend here by giving us choices, but you should really only use multipart messages. This is because the use of multipart message types gives us another level of indirection and also gives us a reusable type that we can leverage in all of our orchestrations in the solution rather than just in one.

Assigning a message directly to a schema message type is almost like in lining a type. It effectively creates a statement in the orchestration, declaring a variable and its type: OrderSchema MyMessage. This means that anywhere the message is used, the orchestration is essentially being coded to this in lined type.

This is significant because changing a schema without using multipart messages will require disconnecting all the port, send, and receive shapes within the Orchestration Designer. This is a tedious and error-prone process and it is not always easy to track down all of these affected shapes. There is also no automated way to do it ahead of time. If you think you'll never change your message types, imagine what happens the first time you have to version your assemblies (for a side by side deployment). If you guessed that you need to disconnect all the port, send, and receive shapes in your orchestration, you guessed right!

The solution is to use a multipart message type in the orchestration and then create a body part for the new multipart message type that is bound to the schema we want. We then assign the message to use this new multipart message type. This effectively translates to a declaration similar to IMySchema MyMessage. This is not a totally accurate translation because orchestration really doesn't care what the previous interface is, just that it is there. In a way it means we are free to change the implementation, just not the actual interface. This pattern probably has more in common with COM, or even the concept of a pointer to a pointer, or a reference to a pointer.

Multipart message types can be reused between orchestrations, so that we do not have to create as many message types, but be aware that this does introduce a small degree of coupling. Sometimes it is a good idea to put all multipart messages or shared types into an orchestration that contains no logic at all and is only used for such definitions.

Finally, this technique allows you to stub out your ideas more quickly in the development phase because you can always just convert everything into an XmlDocument. This directly contradicts my other advice in a later section, but we all do it at some time. Only don't let it get into production that way!

Avoid large orchestrations

This is the same as the OOP coding suggestion to avoid excessively long methods. As you build orchestrations, it becomes very easy to keep adding functionality to them by dragging more shapes onto the canvas. This can make the orchestration difficult to follow and lock all of the functionality deep inside of it. The Call Orchestration shape is a simple way to compartmentalize your orchestrations. This shape works just like a method or function call—it is a synchronous direct invocation. You can send parameters into it or receive them out; or both. Call Orchestration even supports reference parameters. Any orchestration that does not have an activating receive shape as its first shape is considered callable. One benefit to call is that, just like a function or method invocation, there is not a lot of overhead with the call. This is different from Start Orchestration, which asynchronously creates a new orchestration under a new context and does so via the message box. Start Orchestration is actually a specialized direct send.

Encapsulating reusable logic inside called orchestrations is a great way to leverage reuse and is something you should address the same way you do in traditional code. Generally, the second time that you need logic that already exists, simply refactor the original logic into a new orchestration and execute it using Call Orchestration. If you know at the outset that your orchestration will be big, and some are by nature, try to compose it from multiple orchestrations. This is clearly simple advice, but it took me a long time to take it to heart and realize the benefits of doing so.

Minimize trips to the message box (persistence points)

One of the great benefits of orchestration is that it provides us with robust durability automatically. This is very different from traditional programming paradigms. Orchestrations are process agile, meaning that they are independently able to change to a different process—indeed server—as needed; an example would be an outage. This is possible because at any point where an orchestration makes contact with the outside world, it saves its state in the message box. This is called a persistence point and it happens automatically. During this operation, all messages and variables used within the orchestration, including any artifacts that contribute to its state, are saved into the message box. Send shapes are the most common persistence point. The engine does this because once the orchestration sends out a message, it does not know if the response will be immediate or not. Remember, the message box acts as a buffer here, like always, decoupling the orchestration from the actual send and receive. This allows the orchestration to be unloaded from memory and to restart on any available host when it continues. Atomic scopes and send operations result in persistence points, but others happen automatically for other reasons. Most, however, are driven by a trip through the message box.

An easy way to determine if your orchestrations are resulting in too many persistence points is to use Performance Monitors to determine the runtime characteristics of your solution. I always recommend doing this. For orchestration in particular, the performance monitor's Persistence points and Persistence points/sec in the XLANG/s Orchestrations performance object category are particularly useful.

A simple tip to reduce persistence points includes, bunching multiple sends together in a single atomic operation. This will result in one persistence point instead of one for each send. This can be confusing at first but it works fine. Orchestration is not like C# or Java. A send shape does not need to be immediately followed by its corresponding receive shape. This is because the response is held in the message box anyway. If you have to call three web services that don't require the response from one to call the next, you can place all three send shapes in a single atomic scope and they will all go to the message box together; in one persistence point. Even if the responses arrive out of order, the orchestration will only retrieve them in the order you have laid the receive shapes in.

It is a good practice to design with persistence point counts in mind. Because persistence has a significant cost and can happen unexpectedly, there really is no replacement for stress testing of a solution to find where these points will emerge.

Avoid using atomic scopes to call .NET methods

Orchestration allows us to create .NET class variables and use their methods to assist our business process. This can be a great feature, but any class that is used as a variable in an orchestration must be marked with the Serializable attribute. This allows the orchestration engine to store the class with the running orchestration in the message box (the durability we talked about before). This is fine if you're creating your own classes and libraries, but not if you're using existing ones. The workaround most use is the atomic scope, which will allow us to instantiate any .NET class and call its methods. The atomic scope was designed to handle Atomicity, Consistency, Isolation, and Durability (ACID) compliant operations that must either all succeed or all fail as a group. This is a classic database transaction style. It is designed to carry an orchestration from one stable state to another. This is why you cannot both send and receive from an atomic scope, because by design the message box is not a lockable resource. To accomplish this atomicity, the orchestration engine persists the entire orchestration state to the message box before the atomic scope begins; it subsequently persists the orchestration again when the atomic scope completes.

Do not use an atomic scope to simply call a method of a .NET class that is not Serializable. If you absolutely must call a non-Serializable class, and can only do it in an atomic scope, try to combine this with other operations to make the most of the trip to the message box; like a send shape as shown in the following figure:

This call at least reuses one of the two persistence points to perform a send that was needed anyway. A better way to solve this problem is to create a wrapper class with a static method that instantiates the required objects, uses them, and returns the desired result. Keep in mind that the orchestration engine is smart enough to inspect any classes that you create to make sure they do not have non-Serializable members.

Don't use XmlDocument for a message type… ever

As we saw earlier, the orchestration engine will allow us to represent any message as an XmlDocument. Just because this feature is available does not mean that we should use it. XmlDocument within XLANGs (the orchestration engine) is actually a special class that wraps the XmlDocument; which oddly enough is not serializable. This class is dangerous for several reasons. First, it allows us to create messages and variables that will fail if any of their methods or properties are used; this is in the case of non-XML messages being represented by this message type. These errors will also only happen at runtime. This is a bigger problem because as new developers have to work with a solution, it will not be clear which messages they can modify and how. If you have edge cases using non-XML in an XmlDocument and your solution testing doesn't provide 100 percent coverage, you could end up with a fatal runtime error that you only find in production.

Worse still for messages that actually are XML, XmlDocument loads the contents into the DOM (Document Object Model) for processing. The first and most obviously dangerous side effect of this is that because the DOM allows random access to the document, it loads the entire document into memory. To make matters worse, in order to make this access fast, the memory requirement is often an order of magnitude greater than for the pure XML data. That means a 100K message will likely occupy 1MB of memory. If you have several of them and have moderate throughput, you will face memory pressure and the throughput of your solution will rapidly decline.

Another and more subtle issue often caused by using an XmlDocument is that the class allows for modification of the document through its members. Recall from the earlier pages that messages in BizTalk are immutable; that is they cannot be changed. This is at the center of the durability and distributed architecture of BizTalk. An XmlDocument will allow you to make local changes in a message that are not reflected in the message box. Under certain circumstances, such as when persistence points occur and orchestrations rehydrate on another server, your local changes may not appear anymore. This is precisely because messages truly are immutable. Troubleshooting issues such as this one are time consuming and difficult because it can be hard to reproduce.

If you need to pass non-XML messages through an orchestration, you should really use XLANGMessage to do this. This class makes the intention clear that the message is not XML and should not be treated as such. It also has a smaller memory footprint. Like XmlDocument, XLANGMessage can be assigned to any message—with the validation happening at runtime. This class can be found in the Microsoft.XLANGs.BaseTypes namespace of the Microsoft.XLANGs.BaseTypes.dll assembly.

Finally, if you must access the content of an XML message within an orchestration (normally done through a helper class) you should retrieve the body part as XmlReader rather than as XmlDocument. The XmlReader class is stream-based, like the rest of the BizTalk infrastructure; this preserves the flat memory footprint that is sought after in BizTalk.

Alternatively, you can also create .NET classes that match your schema using xsd.exe, which will generate classes conforming to the message schema. This technique will allow you to work with messages in .NET, which can be more useful in helper methods. Both of these techniques are accomplished using the RetrieveAs method of the XLANGMessage class, as shown in the following two methods:

public MyMessage SomethingInteresting(XLANGMessage message)
{
    MyMessage myMessage = message[0].RetrieveAs(typeof(MyMessage)) as MyMessage;
    return myMessage;
}
public XmlReader SomethingElse(XLANGMessage message)
{
    XmlReader reader = message[0].RetrieveAs(typeof(XmlReader)) as XmlReader;
    return reader;
}

Again, keep in mind that less is more. If you need to do extremely complex operations in many helper methods, you might want to reconsider your solution. All .NET data structures are memory resident, so using the XmlSerializer—which is what the previous example ultimately does—loads the entire message into memory. That said, it takes considerably less memory space than an XmlDocument.

Avoid loading messages into classes via the XmlSerializer

The XmlSerializer allows us to load messages into classes automatically. This class dates to the very first versions of the .NET Framework and is a great technique most developers are unaware of. Working in C# or Visual Basic .NET, it is much easier to work with classes and data structures than with the XML classes of the System.Xml namespace. This is the technique presented previously. This is a convenient feature and there certainly are some situations where it can be much easier to accomplish a given development task in .NET classes rather than in XML messages or maps. It is important to note, however, that the entire architecture of BizTalk Server is built around the concept of stream-based components. This means that message sizes, even for very large messages, will not adversely affect the operation of the platform. This gives BizTalk a very flat memory footprint and helps to make the product as scalable as it is. The XmlSerializer does not follow this same stream-based approach. On the contrary, classes serialized with the XmlSerializer are completely loaded into memory at one time and can have negative effects on the memory footprint of your solutions as a result. Generally speaking, if the messages are small, or the nodes that you send into a method for processing are small, then the impact may not be too severe. Keep in mind that even small messages, when there are a lot, can add up to large memory footprints.

Use direct bound ports and Content Based Routing

Many developers new to BizTalk don't grasp the power and flexibility of Content Based Routing (CBR) in BizTalk Server. CBR allows for the implementation of logical decisions that new developers often turn to orchestration to solve. In the following figure, we see an example of this. A decide shape is used to route a message based upon content within the message. This could be a total order amount, the status of the client, or anything else in the message. In this case, it is based on the vendor.

The previous orchestration clearly shows what routing decision is being made and is easy to understand, but it is not an optimal solution for many scenarios. For one, the solution is effectively hardwired. Changing any of the routing or adding new vendors will require recompiling and deploying the solution. A good alternative is to use CBR in BizTalk, which is often referred to as a direct bound port in an orchestration. To accomplish this, follow these three steps:

Create a property schema in the solution and promote the property you wish to use for routing. This can be done via the Add New Item… dialog in Visual Studio.
Configure the send port in the orchestration to use direct binding. This will cause BizTalk to submit the message directly to the message box; at which point it will be matched against any subscriptions and sent to all subscribers. This is shown in the following screenshot:
After deploying the new property schema, configure each send port you want to include with a filter (subscription) that matches the desired operation. Usually, this will be a message type, an operation, or receive port name, and a value for the promoted property. The following example shows a filter that makes this solution functionally equivalent to one of the branches in the orchestration in the previous figure.
Although this may not be as graphically appealing as the orchestration approach, it does the same thing and is easier to change in the future. With this approach, new vendors can be added using only the Administration console. Many developers who become comfortable with CBR often forget that it can be used throughout BizTalk, not just on send ports.

Leverage filters in orchestrations

The most common way to receive messages in an orchestration is to use the Specify Later option when configuring a port. This is then followed by binding the orchestration to a receive port in the BizTalk Administration console. This pattern does carry some significant benefits. It is very easy for administrators to understand the flow of messages and also to know the impact of downtime or changes. The drawback is that an orchestration is then coupled to a single port. Although this port can have multiple locations, it is still just a single port. In simple scenarios, this is not a bad approach but it makes reuse much more difficult as an orchestration is effectively now coupled to a specific receive port, which must have external messages flow into it. Because this results in a specific path of message flow, it also tends to lead to large orchestrations. A simple way around this is to use a direct bound receive port.

When configuring a receive port in the Port Configuration Wizard, simply change the type to direct and use the default option of Routing between ports will be defined by filter expressions on incoming. After configuring the port, you now use the Filter property of the receive shape to define the expression that will match incoming subscriptions. This filter functions just like the other filters that we have covered up to this point. An example orchestration filter is shown in the following screenshot:

Importantly, inside of the orchestration's filter values (the right-hand side of the expressions) must be enclosed in quotes. The orchestration will not compile otherwise.

What this now allows us to do is chain orchestrations together, so that they can feed into each other in arbitrarily long or winding chains. This is a pretty advanced concept, so it may take a while to really leverage. This is an extreme approach to loose coupling and does not provide the crutch that a single large graphical representation, or even that Call Orchestration would offer us. One of the greatest advantages to this approach is that it actually decouples our orchestrations from each other, allowing them to change at different rates. Like all design decisions, simpler tends to be better, so don't go out of your way just to try to use this. Experiment with it and eventually you will find the right use for this technique.

There is another type of binding Specify now that we should never use as it performs the port creation from within the orchestration and results in very tight coupling that is not easily changed without recompiling.

Use distinguished fields instead of XPath

The X# language provides us with a few tools specifically designed to make working with messages easier. One of these is the xpath function, which can be used to set or retrieve values from an XML message. The syntax is as follows:

xpath(Message/Part, <<XPath expression string>>)

The XPath expression is a fully qualified expression including namespaces. The easiest way to get this expression is to navigate to the node you're trying to read or write in the schema editor and look in the properties window at the Instance XPath. Simply copy it from here. Other tools like XML NotePad and DanSharp XmlViewer also provide easy access to XPath expressions.

This can be really useful, but it carries a few side effects. First, it normally requires long function calls that can't easily be seen on the screen without scrolling, such as the following:

xpath(MyMessage, "/*[local-name()='MyMessage' and namespace-uri()='http://nova/billing/schemas/internal/2011-05']/*[local-name()='Number' and namespace-uri()='']") = "1234";

That's a pretty big assignment statement and this is only a simple example. A much easier way to do this is to use distinguished fields in your schema. Real-world examples will be much larger than this. A distinguished field is a schema annotation that is used by BizTalk to allow for XPath shorthand. In orchestration, this can make our statements much more simple.

To distinguish a field, simply right-click the node inside the schema editor and under the Promote option select Show Promotions…. This is shown in the following screenshot:

From here you have a list of all the nodes in the schema on the left (with the node you had right-clicked and already selected) and all the distinguished fields on the right. Simply click the Add >> button and then click OK. Be sure to save the schema, and if it is in a separate project to recompile, as shown in the following screenshot:

A visual queue is given to let us know that this node has been either distinguished or promoted. It is the little gold and blue icon now attached to the node. The resulting assignment statement is much easier to read and its intention is much more clear.

MyMessage.Number = "1234";

There is also another benefit that is not always apparent at first. The XPath statement before has the namespace in it and if we need to change our namespace, we must remember to go and change all these XPath strings in the orchestrations. Changing namespaces is common when versioning solutions (creating a 2.0 version, and so on) and the xpa th function is evaluated only at runtime, so as long as the second parameter is a string, it will compile; even if it is the wrong string. Because the distinguished field is annotated in the schema with which we're working, it will be automatically updated when the schema's namespace is updated.

There is, however, one drawback to using distinguished fields; it is that they must exist in the message at runtime in order to avoid an error when evaluating them. If it's an important piece of data it should exist anyway, but you can also explicitly map in place holder or null values if needed. Zero would be a good example in a decimal field.

Avoid unnecessary looping on collections

I frequently see scenarios where developers receive a batch into an orchestration and then loop over the records to process them. I particularly see this from the SQL or WCF-SQL adapters; or for that matter from any database adapter. Often the result looks similar to the following figure:

This is a bad use of orchestration for a variety of reasons. For one, although not clearly shown, it uses xpath and the DOM to create each individual message. We've already covered why that is not a good practice. Worse still, each submessage then does a send, which results in a persistence point. Larger messages will not work well in this sort of arrangement. If you needed to break apart a message in an orchestration, calling a pipeline with an XML disassembler from within the orchestration would be a much better approach to the xpath DOM assignment approach. Don't let that friendly loop lull you into a sense of familiarity. The previous loop is nothing like what you would write in a procedural or imperative language, it incurs a lot of overhead and there is no way to keep a single connection to the destination system open. The loop, like most orchestration shapes, is meant as a control structure, not a programming structure.

The problems actually only get worse from here. There is no transactional integrity between the individual requests. If one request fails, all the messages before it are already complete and are committed in the case of database transactions. It will also stop any subsequent messages in the batch from processing. You could add compensation or exception logic to address this, but it really just starts piling more bad approaches on an approach that is not elegant to begin with.

There are two better ways to approach this issue and the approach which you should use will depend on what you're doing. They are as follows:

To debatch (disassemble) the message in the receive location, which also allows us better control over failures. The orchestration would then subscribe to the debatched message and the map on the receive port would map at this debatched level as well. This approach will result in many orchestrations running completely independently. The modeling path is simple and this works well in many cases. This will scale, even for very large messages, though you may end up with throttling due to large loads. BizTalk is designed to handle large loads.
To simply map the multiple lines together in a single request. The classic SQL adapter even allows you to combine separate operations to different stored procedures or updategrams in a single request. The order of the nodes within the XML determines the order in which they execute. This can be a very useful technique as all operations complete in a single Distributed Transaction Coordinator (DTC) context. That is to say, they either all succeed or all rollback as one. There is much less deadlocking in the case of database connections because all the requests execute serially in this single DTC transaction.

This technique is very useful, but very large messages will strain the adapters. The SQL and WCF-SQL adapters tend to decline in performance above 50,000 transactions in a single message. That said, this amounts to a lot of transactions. Because we already discussed that ETL is normally better done with other tools like SSIS, that makes this approach even more useful in context.