Wednesday, October 24, 2012

Retrieve Performance Data from SOA Infrastructure Database



My earlier blog posting shows how to enable, retrieve and interpret BPEL engine performance statistics to aid performance troubleshooting. The strength of BPEL engine statistics at EM is its break down per request. But there are some limitations with the BPEL performance statistics mentioned in that blog posting:

  • The statistics were stored in memory instead of being persisted. To avoid memory overflow, the data are stored to a buffer with limited size. When the statistic entries exceed the limitation, old data will be flushed out to give ways to new statistics. Therefore it can only keep the last X number of entries of data. The statistics 5 hour ago may not be there anymore.
  • The BPEL engine performance statistics only includes latencies. It does not provide throughputs.

Fortunately, Oracle SOA Suite runs with the SOA Infrastructure database and a lot of performance data are naturally persisted there. It is at a more coarse grain than the in-memory BPEL Statistics, but it does have its own strengths as it is persisted.

Here I would like offer examples of some basic SQL queries you can run against the infrastructure database of Oracle SOA Suite 11G to acquire the performance statistics for a given period of time. You can run it immediately after you modify the date range to match your actual system.

1. Asynchronous/one-way messages incoming rates

The following query will show number of messages sent to one-way/async BPEL processes during a given time period, organized by process names and states

select composite_name composite, state, count(*) Count from dlv_message
       where receive_date >= to_timestamp('2012-10-24 21:00:00','YYYY-MM-DD HH24:MI:SS')
         and receive_date <= to_timestamp('2012-10-24 21:59:59','YYYY-MM-DD HH24:MI:SS')
      group by composite_name, state
      order by Count;

2. Throughput of BPEL process instances

The following query shows the number of synchronous and asynchronous process instances created during a given time period. It list instances of all states, including the unfinished and faulted ones. The results will include all composites cross all SOA partitions

select state, count(*) Count, composite_name composite, component_name,componenttype from cube_instance
       where creation_date >= to_timestamp('2012-10-24 21:00:00','YYYY-MM-DD HH24:MI:SS')
          and creation_date <= to_timestamp('2012-10-24 21:59:59','YYYY-MM-DD HH24:MI:SS')
        group by composite_name, component_name, componenttype 
          order by count(*) desc;


3. Throughput and latencies of BPEL process instances

This query is augmented on the previous one, providing more comprehensive information. It gives not only throughput but also the maximum, minimum and average elapse time BPEL process instances.

select composite_name Composite, component_name Process, componenttype, state,
       count(*) Count,
      trunc(Max(extract(day    from (modify_date-creation_date))*24*60*60 + 
                extract(hour   from (modify_date-creation_date))*60*60 + 
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) MaxTime, 
      trunc(Min(extract(day    from (modify_date-creation_date))*24*60*60 + 
                extract(hour   from (modify_date-creation_date))*60*60 +  
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) MinTime, 
      trunc(AVG(extract(day    from (modify_date-creation_date))*24*60*60 +  
                extract(hour   from (modify_date-creation_date))*60*60 +  
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) AvgTime       
       from cube_instance
       where creation_date >= to_timestamp('2012-10-24 21:00:00','YYYY-MM-DD HH24:MI:SS')
          and creation_date <= to_timestamp('2012-10-24 21:59:59','YYYY-MM-DD HH24:MI:SS')
        group by composite_name, component_name, componenttype, state
          order by count(*) desc; 

4. Combine all together

Now let's combine all of these 3 queries together, and parameterize the start and end time stamps to make the script a bit more robust. The following script will prompt for the start and end time before querying against the database:



accept startTime prompt 'Enter start time (YYYY-MM-DD HH24:MI:SS)'
accept endTime   prompt 'Enter end time (YYYY-MM-DD HH24:MI:SS)'

Prompt "==== Rejected Messages ====";
REM 2012-10-24 21:00:00
REM 2012-10-24 21:59:59
select count(*), composite_dn from rejected_message
       where created_time >= to_timestamp('&&StartTime','YYYY-MM-DD HH24:MI:SS')
         and created_time <= to_timestamp('&&EndTime','YYYY-MM-DD HH24:MI:SS')
    group by composite_dn;
    
Prompt "   ";
Prompt "==== Throughput of one-way/asynchronous messages ====";
select state, count(*) Count, composite_name composite from dlv_message
       where receive_date >= to_timestamp('&StartTime','YYYY-MM-DD HH24:MI:SS')
         and receive_date <= to_timestamp('&EndTime','YYYY-MM-DD HH24:MI:SS')
      group by composite_name, state
      order by Count;

Prompt "   ";
Prompt "==== Throughput and latency of BPEL process instances ===="
select state,
       count(*) Count,
      trunc(Max(extract(day    from (modify_date-creation_date))*24*60*60 + 
                extract(hour   from (modify_date-creation_date))*60*60 + 
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) MaxTime, 
      trunc(Min(extract(day    from (modify_date-creation_date))*24*60*60 + 
                extract(hour   from (modify_date-creation_date))*60*60 +  
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) MinTime, 
      trunc(AVG(extract(day    from (modify_date-creation_date))*24*60*60 +  
                extract(hour   from (modify_date-creation_date))*60*60 +  
                extract(minute from (modify_date-creation_date))*60 + 
                extract(second from (modify_date-creation_date))),4) AvgTime,
       composite_name Composite, component_name Process, componenttype
       from cube_instance
       where creation_date >= to_timestamp('&StartTime','YYYY-MM-DD HH24:MI:SS')
          and creation_date <= to_timestamp('&EndTime','YYYY-MM-DD HH24:MI:SS')
        group by composite_name, component_name, componenttype, state
          order by count(*) desc;