Tuesday 20 December 2011

Weblogic Server JMS reconnection pain

For many months we have had trouble with JMS clients to Weblogic Server (v9.2) repeatedly throwing the following error when sending messages to the JMS server after it had gone down:
org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 203 completed: No
at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2259)
at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(ORBUtilSystemException.java:2281)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(SocketOrChannelConnectionImpl.java:957)
at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(BufferManagerWriteStream.java:86)
at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(BufferManagerWriteStream.java:104)
at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(CDROutputObject.java:144)
at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(CorbaMessageMediatorImpl.java:247)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(CorbaClientRequestDispatcherImpl.java:355)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(CorbaClientRequestDispatcherImpl.java:336)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(CorbaClientDelegateImpl.java:129)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(CorbaClientDelegateImpl.java:213)
at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:112)
at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:94)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:31)
at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:288)
at javax.naming.InitialContext.init(InitialContext.java:223)
at javax.naming.InitialContext.(InitialContext.java:197)

My initial attempt was to see if it was an issue with the CORBA code - this resulted in finding an unrelated race condition in the JDK CORBA implementation that created a similar exception. A bug report to Sun can be found here: Bug ID: 7056731 - Race condition in CORBA code causes re-use of ABORTed connections. Unfortunately this wasn't on the right track - it seems that the default implementation of the RMI/CORBA/IIOP connection to Weblogic Server as part of the 'wljmsclient.jar' wasn't able to recover broken connections (!) - since some kind of singleton seemed to be used so we were forced to restart the application each time the Weblogic Server instance went down. This is a Bad Thing.

After some research I found that the full BEA/Oracle t3 protocol would be able to gracefully recover closed connections: Overview of Stand-alone Clients. To use the proper t3 protocol we had to generate 'wlfullclient.jar' which was ~55mb (!... again); unfortunately this didn't seem to be a feature of Weblogic Server 9.2, so I had to improvise. Furthermore, since the 'wlfullclient.jar' bundled up many weblogic-unrelated classes we ran into JAR hell.

My solution was to take two jar files from the WLS server/lib directory: weblogic.jar, webserviceclient+ssl.jar and in order to strip out unnecessary classes I used my test Java JMS test code with the the java parameter: '-verbose:class' to see what classes were used when sending a JMS message. The test code is shown in a previous post. After some experimentation I was able to strip out what classes I found to be unnecessary... note that I was quite conservative in what classes I stripped out (especially the weblogic and com/bea packages) as there might be strange corner cases in practice that invoke classes not seen as part of 'normal' execution.

Once this was done I created my own JAR file 'wlfullclient-stripped.jar' that was made up of WLS v9.2 classes. I also took the MANIFEST.MF from 'wlclient.jar' just in case something relied on this for the client. A list of all classes that I ended up using from these packages can be found here.

Once this JAR file was tested I needed to upload it to our Maven Repository for easy use within our projects:
mvn deploy:deploy-file -DrepositoryId=repo -DgroupId=com.bea.weblogic -DartifactId=wlfullclient -Dversion=9.2.2.0 -Dpackaging=jar -Dfile=wlfullclient-stripped-9.2.jar -Durl=http://repo.com/repository/external -Dclassifier=stripped
Since I stripped out the javax JMS/JTA classes out of the JAR these had to be added to our Maven repository as artifacts (these are not part of the 'standard' Maven repos):
mvn deploy:deploy-file -DrepositoryId=repo -DgroupId=javax.transaction -DartifactId=jta -Dversion=1.1 -Dpackaging=jar -Dfile=jta-1.1.jar -Durl=http://repo.com/repository/external
mvn deploy:deploy-file -DrepositoryId=repo -DgroupId=javax.jms -DartifactId=jms -Dversion=1.1 -Dpackaging=jar -Dfile=jms-1.1.jar -Durl=http://repo.com/repository/external

The resulting wlfullclient-stripped.jar is ~22mb... certainly not small but, without further testing as to the exact set of classes used as part of sending a JMS message, is unavoidable. To use the JMS client in a Maven repo the following dependencies are added:
<dependency>
      <groupId>javax.jms</groupId>
      <artifactId>jms</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>javax.transaction</groupId>
      <artifactId>jta</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>com.bea.weblogic</groupId>
      <artifactId>wlfullclient</artifactId>
      <version>9.2.2.0</version>
      <classifier>stripped</classifier>
    </dependency>

Running it is still not without problems. The following error was encountered when trying to send a message to the WLS JMS Server:
javax.naming.CommunicationException [Root exception is java.rmi.UnmarshalException: failed to unmarshal class weblogic.security.acl.internal.AuthenticatedUser; nested exception is: 
 java.io.StreamCorruptedException: invalid type code: 31]
 at weblogic.jndi.internal.ExceptionTranslator.toNamingException(ExceptionTranslator.java:85)
 at weblogic.jndi.internal.ExceptionTranslator.toNamingException(ExceptionTranslator.java:41)
 at weblogic.jndi.WLInitialContextFactoryDelegate.toNamingException(WLInitialContextFactoryDelegate.java:773)
 at weblogic.jndi.WLInitialContextFactoryDelegate.pushSubject(WLInitialContextFactoryDelegate.java:673)
 at weblogic.jndi.WLInitialContextFactoryDelegate.newContext(WLInitialContextFactoryDelegate.java:466)
 at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:373)
 at weblogic.jndi.Environment.getContext(Environment.java:307)
 at weblogic.jndi.Environment.getContext(Environment.java:277)
 at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:117)
 at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667)
 at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:288)
 at javax.naming.InitialContext.init(InitialContext.java:223)
 at javax.naming.InitialContext.(InitialContext.java:197)
...

This was resolved (after some searching) by adding the following system parameter: -Dsun.lang.ClassLoader.allowArraySyntax=true. Note that you will have to add it to MAVEN_OPTS if using the exec plugin. One final issue was that if using t3s (i.e. SSL) then the BEA WLS license is required; or the following error would appear:
java.lang.RuntimeException: Exception occurred while reading the license file.
        at weblogic.security.utils.SSLSetup.getLicenseLevel(SSLSetup.java:143)
        at weblogic.security.utils.SSLSetup.getSSLDelegateInstance(SSLSetup.java:196)
        at weblogic.security.utils.SSLContextWrapper.(SSLContextWrapper.java:27)
        at weblogic.security.utils.SSLContextWrapper.getInstance(SSLContextWrapper.java:22)
        at weblogic.security.utils.SSLSetup.getSSLContext(SSLSetup.java:380)
        at weblogic.security.SSL.SSLClientInfo.getSSLSocketFactory(SSLClientInfo.java:56)
        at weblogic.socket.ChannelSSLSocketFactory.getSocketFactory(ChannelSSLSocketFactory.java:125)
        at weblogic.socket.ChannelSSLSocketFactory.createSocket(ChannelSSLSocketFactory.java:75)
        at weblogic.socket.AbstractMuxableSocket.createSocket(AbstractMuxableSocket.java:207)
        at weblogic.rjvm.t3.MuxableSocketT3.newSocketWithRetry(MuxableSocketT3.java:252)
        at weblogic.rjvm.t3.MuxableSocketT3.connect(MuxableSocketT3.java:421)
        at weblogic.rjvm.t3.MuxableSocketT3S.createConnection(MuxableSocketT3S.java:79)
        at weblogic.rjvm.ConnectionManager.createConnection(ConnectionManager.java:1749)
        at weblogic.rjvm.ConnectionManager.findOrCreateConnection(ConnectionManager.java:1410)
        at weblogic.rjvm.ConnectionManager.bootstrap(ConnectionManager.java:448)
        at weblogic.rjvm.ConnectionManager.bootstrap(ConnectionManager.java:326)
        at weblogic.rjvm.RJVMManager.findOrCreateRemoteInternal(RJVMManager.java:261)
        at weblogic.rjvm.RJVMManager.findOrCreate(RJVMManager.java:204)
        at weblogic.rjvm.RJVMFinder.findOrCreateRemoteServer(RJVMFinder.java:226)
        at weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:189)
        at weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:154)
        at weblogic.jndi.WLInitialContextFactoryDelegate$1.run(WLInitialContextFactoryDelegate.java:342)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
        at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:147)
        at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:337)
        at weblogic.jndi.Environment.getContext(Environment.java:307)
        at weblogic.jndi.Environment.getContext(Environment.java:277)
        at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:117)
        at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
        at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
        at javax.naming.InitialContext.init(Unknown Source)
        at javax.naming.InitialContext.(Unknown Source)
...
This could be resolved by grabbing the BEA license for WLS from the Oracle site. It then needed to be placed somewhere that the code could find it, I found that setting -Dbea.home=/etc/some/dir and placing license.bea under /etc/some/dir (i.e. /etc/some/dir/license.bea) worked as expected.

One final unresolved issue is that sometimes the Weblogic code would throw the following exception the first time it tried to retrieve the JNDI InitialContext:
javax.naming.CommunicationException [Root exception is java.net.ConnectException: t3s://wli.dev.auckland.ac.nz:8001: Destination unreachable; nested exception is: 
 java.io.IOException: Stream closed.; No available router to destination]
 at weblogic.jndi.internal.ExceptionTranslator.toNamingException(ExceptionTranslator.java:49)
 at weblogic.jndi.WLInitialContextFactoryDelegate.toNamingException(WLInitialContextFactoryDelegate.java:773)
 at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:363)
 at weblogic.jndi.Environment.getContext(Environment.java:307)
 at weblogic.jndi.Environment.getContext(Environment.java:277)
 at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:117)
 at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:667)
 at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:288)
 at javax.naming.InitialContext.init(InitialContext.java:223)
 at javax.naming.InitialContext.(InitialContext.java:197)
Strangely enough this would only happen once, further requests for the InitialContext were successful... I haven't had time to investigate it further because it's not a show stopper but I suspect it's either because of a clustering or timeout issue... Eitherway, it's something to be aware of.

No comments:

Post a Comment