Wednesday, February 1, 2012

Tune Audit Trail in SOA 11G to Avoid Memory and Transaction Problems



Until 11.1.1.3, BPEL audit trails are saved to database in the same JTA transaction as the main transaction. This causes three main problems:
  1. Most common: Latency of saving audit trail to database is included as part of the overall latency of the main business transaction.
  2. Often seen: When the main transaction is rolled back for whatever reason, the audit trail did not get saved, because audit trails are saved in the main transaction as well. Thus no trace of what had happened can be found on the BPEL Console (or EM Console in 11G). This gives more difficulties for debugging.
  3. Happen when you have large while loop: when a BPEL process instance has large number of activities (typically from using large while loop), the amount audit trails stored in the memory gets so large that, the BPEL service engine either encounters an OutOfMemoryException, or, thanks to the maxRequestDepth property, it commits the transaction early to flush out the audit data from memory to database to avoid OutOfMemoryException. In doing so BPEL service engine introduce extra transaction boundaries into the main transaction which sometimes cause undesirable behavior.
Since SOA 11.1.1.3 (i.e. SOA 11G PatchSet 2), management of audit trail memory has been greatly enhanced. Not only the size of the memory is reduced, but also when and how audit trail is stored is improved. The above three problems now have solutions.

The audit trail enhancement comes in the form of the following configuration properties

auditStorePolicy

Use this flag to choose the audit store strategy
  • syncSingleWrite - would store audit data synchronously when it saves the cube instance, this is done as part of the cube instance transaction.
This is the default value. And the behavior is the same as in the 10.1.3.x version.
  • syncMultipleWrite - would store audit data synchronously using a separate local transaction.
By "synchronously" it means the BPEL service engine is saving the audit trail in the same thread as the main transaction. However, it is doing it in a "separate local transaction".

Because they are on the same thread, latency of saving audit trail is still included into overall latency of the main transaction.

However, because they are on separate transactions, you can configure BPEL engine (using AuditFlushByteThreshold and AuditFlushEventThreshold) to flush out the audit trail from memory to database periodically regardless how long the main transaction would take. Moreover, having them on two separate transaction means the rollback of main transaction will NOT affect the audit trail transaction. That is, you will still see audit trail even if the main transaction rolls back.
  • async - would store the audit data asynchronously using an in-memory queue and pool of audit threads.
This is almost the same as "syncMultipleWrite", except that it is done not just in a separate transaction but also in a separate thread.

The pros is the audit trail latency is NOT directly included in the latency of main transaction (but because they still share the computing resources and database, the latency is still indirectly related).

The cons is that because audit trails are being saved asynchronously, the audit trail may be out of sync from the main transaction (as the name 'async' implies).

AuditFlushByteThreshold and AuditFlushEventThreshold

When auditStorePolicy=syncMultipleWrite or auditStorePolicy=async, you use these two flags to control how often the engine should flush the audit events. These two properties do NOT apply to auditStorePolicy=syncSingleWrite.

auditFlushByteThreshold means after adding an event to the current batch, the engine would check if current batch byte size is greater than this value or not. if yes, then it would flush the current batch. The default value is 2048000 (byte), i.e. 2MB.

Similarly, auditFlushEventThreshold means this limit is reached, the engine would trigger the store call. The default value is 300 (event)

Both values need to be tuned based on the application and requirements.

AuditDetailThreshold

This is the maximum size (in bytes) an audit trail details string can be before it is stored separately from the audit trail. The default value is 50000 (byte). This is the same property as in 10.1.3.x. Please refer to the SOA Management MBean for its detail explanation.

AuditLevel

This is the same property in 10.1.3.x.

How to Configure

All of the above properties can be configured via the SOA EM Console. The path is EM -> SOA Infrastructure -> SOA Administration -> BPEL Properties -> More BPEL Configuration Properties