This is an old, Oracle SOA and OC4J 10G topic. In fact this is not even a SOA topic per se. Questions of RMI load balancing arise when you developed custom web applications accessing human tasks running off a remote SOA 10G cluster. Having returned from a customer who faced challenges with OC4J RMI load balancing, I felt there is still some confusions in the field how OC4J RMI load balancing work. Hence I decide to dust off an old tech note that I wrote a few years back and share it with the general public.
Here is the tech note:
A typical use case in Oracle SOA is that you are building web based, custom human tasks UI that will interact with the task services housed in a remote BPEL 10G cluster. Or, in a more generic way, you are just building a web based application in Java that needs to interact with the EJBs in a remote OC4J cluster. In either case, you are talking to an OC4J cluster as RMI client. Then immediately you must ask yourself the following questions:
1. How do I make sure that the web application, as an RMI client, even distribute its load against all the nodes in the remote OC4J cluster?
2. How do I make sure that the web application, as an RMI client, is resilient to the node failures in the remote OC4J cluster, so that in the unlikely case when one of the remote OC4J nodes fail, my web application will continue to function?
That is the topic of how to achieve load balancing with OC4J RMI client.
You need to configure and code RMI client load balancing and high availability in following places:
1. Provider URL can be specified with a comma separated list of URLs. This takes care of the high availability at the initial lookup
2. Choose a proper value for the oracle.j2ee.rmi.loadBalance property, which, along side with the PROVIDER_URL property, is one of the JNDI properties passed to the JNDI lookup.(http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI). Doing so will assure the RMI client will evening distribute its requests to all the destination OC4J servers.
3. The RMI client must catch the exception and retry when one of the destination server happens to go down and fail the JNDI lookup.
About the PROVIDER_URL
The JNDI property java.name.provider.url's job is, when the client looks up for a new context at the very first time in the client session, to provide a list of RMI context
The value of the JNDI property java.name.provider.url goes by the format of a single URL, or a comma separate list of URLs.
- A single URL. For example: opmn:ormi://host1:6003:oc4j_instance1/appName1
- A comma separated list of multiple URLs. For examples: opmn:ormi://host1:6003:oc4j_instanc1/appName, opmn:ormi://host2:6003:oc4j_instance1/appName, opmn:ormi://host3:6003:oc4j_instance1/appName
When the client looks up for a new Context the very first time in the client session, it sends a query against the OPMN referenced by the provider URL. The OPMN host and port specifies the destination of such query, and the OC4J instance name and appName are actually the “where clause” of the query.
When the PROVIDER URL reference a single OPMN server
Let's consider the case when the provider url only reference a single OPMN server of the destination cluster. In this case, that single OPMN server receives the query and returns a list of the qualified Contexts from all OC4Js within the cluster, even though there is a single OPMN server in the provider URL. A context represent a particular starting point at a particular server for subsequent object lookup.
For example, if the URL is opmn:ormi://host1:6003:oc4j_instance1/appName, then, OPMN will return the following contexts:
- appName on oc4j_instance1 on host1
- appName on oc4j_instance1 on host2,
- appName on oc4j_instance1 on host3,
Please note that
- One OPMN will be sufficient to find the list of all contexts from the entire cluster that satisfy the JNDI lookup query. You can do an experiment by shutting down appName on host1, and observe that OPMN on host1 will still be able to return you appname on host2 and appName on host3.
When the JNDI propery java.naming.provider.url references a comma separated list of multiple URLs, the lookup will return the exact same things as with the single OPMN server: a list of qualified Contexts from the cluster.
The purpose of having multiple OPMN servers is to provide high availability in the initial context creation, such that if OPMN at host1 is unavailable, client will try the lookup via OPMN on host2, and so on. After the initial lookup returns and cache a list of contexts, the JNDI URL(s) are no longer used in the same client session. That explains why removing the 3rd URL from the list of JNDI URLs will not stop the client from getting the EJB on the 3rd server.
About the oracle.j2ee.rmi.loadBalance Property
After the client acquires the list of contexts, it will cache it at the client side as “list of available RMI contexts”. This list includes all the servers in the destination cluster. This list will stay in the cache until the client session (JVM) ends. The RMI load balancing against the destination cluster is happening at the client side, as the client is switching between the members of the list.
Whether and how often the client will fresh the Context from the list of Context is based on the value of the oracle.j2ee.rmi.loadBalance. The documentation at http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI list all the available values for the oracle.j2ee.rmi.loadBalance.
||If specified, the client interacts with the OC4J process that was initially chosen at the first lookup for the entire conversation.|
||Used for a Web client (servlet or JSP) that will access EJBs in a clustered OC4J environment.
If specified, a new
||Used for a standalone client that will access EJBs in a clustered OC4J environment.|
If specified, a new
Please note the regardless of the setting of oracle.j2ee.rmi.loadBalance property, the “refresh” only occurs at the client. The client can only choose from the "list of available context" that was returned and cached from the very first lookup. That is, the client will merely get a new Context object from the “list of available RMI contexts” from the cache at the client side. The client will NOT go to the OPMN server again to get the list. That also implies that if you are adding a node to the server cluster AFTER the client’s initial lookup, the client would not know it because neither the server nor the client will initiate a refresh of the “list of available servers” to reflect the new node.
About High Availability (i.e. Resilience Against Node Failure of Remote OC4J Cluster)
What we have discussed above is about load balancing. Let's also discuss high availability.
This is how the High Availability works in RMI: when the client use the context but get an exception such as socket is closed, it knows that the server referenced by that Context is problematic and will try to get another unused Context from the “list of available contexts”. Again, this list is the list that was returned and cached at the very first lookup in the entire client session.