Recently we have encoutered several jvm crashes on our test server which is on Amazon cloud. This server is running on CentOS 5.5 x86_64 with Oracle JRE 1.6.0_22. We have Fusesource ServiceMix 4.2.0-fuse-02-00, PostgreSQL 8.4 on this server. This system has been running for more than a year without any problem but last week, ServiceMix just crashed a few times a day without showing anything in the log. I tried to upgrade to Oracle JDK 1.6.0_27 but the problem persisted. I found out that the ulimit for "core file size" for the user that we use to run servicemix is 0

[code]

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 122944
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 122944
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[/code]

 

I changed that to unlimited

 

[code]

ulimit -c unlimited

[/code]

 

and core dump files were produced whenever ServiceMix (or JVM actually) crashes.

I have two core dump files, one from JRE 1.6.0_22 the other from JDK 1.6.0_27.

With the help of gdb, I was able to find the problem:

[code]

gdb /usr/java/jre1.6.0_22/bin/java /usr/local/apache-servicemix-4.2.0-fuse-02-00/core.3595
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/java/jre1.6.0_22/bin/java...(no debugging symbols found)...done.
[New Thread 4249]
[New Thread 13326]
[New Thread 13302]
....

[New Thread 3595]
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/java/jdk1.6.0_27/bin/../jre/lib/amd64/jli/libjli.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/bin/../jre/lib/amd64/jli/libjli.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/server/libjvm.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/server/libjvm.so
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libverify.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libverify.so
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libjava.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libjava.so
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libzip.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libzip.so
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libnet.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libnet.so
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libmanagement.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libmanagement.so
Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/libnio.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/libnio.so
Reading symbols from /usr/java/jdk1.6.0_27/jre/lib/amd64/librmi.so...(no debugging symbols found)...done.
Loaded symbols for /usr/java/jdk1.6.0_27/jre/lib/amd64/librmi.so
Core was generated by `/usr/java/latest/bin/java -server -Xms1024M -Xmx4096M -XX:MaxPermSize=256m -XX:'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaaf640360 in Java_java_net_SocketInputStream_socketRead0 () from /usr/java/jdk1.6.0_27/jre/lib/amd64/libnet.so
[/code]

 

The other core dumpl file also showed me that the JVM crashed at .../amd64/libnet.so

Google brought me to JVM bug 7059899

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7059899

Follow the workaround mentioned there, I change the DEFAULT_JAVA_OPTS in [SMX_HOME]/bin/servicemix to:

[code]

DEFAULT_JAVA_OPTS="-Xms$JAVA_MIN_MEM -Xmx$JAVA_MAX_MEM -XX:MaxPermSize=256m -XX:PermSize=256m -XX:StackShadowPages=20"
[/code]

After restarting, my ServiceMix has been running fine since.