Pipu pinged me today about opatch hanging. The opatch log showed this:
[Apr 11, 2015 5:24:13 PM] Start fuser command /sbin/fuser $ORACLE_HOME/bin/oracle at Sat Apr 11 17:24:13 EDT 2015
I had faced this issue once before, but was not able to recall what was the solution. So I started fresh.
As oracle user:
/sbin/fuser $ORACLE_HOME/bin/oracle hung
As root user
/sbin/fuser $ORACLE_HOME/bin/oracle hung
As root user
lsof hung.
Google searches about it brought up a lot of hits about NFS issues. So I did df -h.
df -h also hung.
So I checked /var/log/messages and found many messages like these:
Apr 11 19:44:42 erpserver kernel: nfs: server share.justanexample.com not responding, still trying
[Apr 11, 2015 5:24:13 PM] Start fuser command /sbin/fuser $ORACLE_HOME/bin/oracle at Sat Apr 11 17:24:13 EDT 2015
I had faced this issue once before, but was not able to recall what was the solution. So I started fresh.
As oracle user:
/sbin/fuser $ORACLE_HOME/bin/oracle hung
As root user
/sbin/fuser $ORACLE_HOME/bin/oracle hung
As root user
lsof hung.
Google searches about it brought up a lot of hits about NFS issues. So I did df -h.
df -h also hung.
So I checked /var/log/messages and found many messages like these:
Apr 11 19:44:42 erpserver kernel: nfs: server share.justanexample.com not responding, still trying
That server has a mount called /R12.2stage that has the installation files for R12.2.
So I tried unmounting it:
umount /R12.2stage
Device Busy
umount -f /R12.2stage
Device Busy
umount -l /R12.2stage
df -h didn't hang any more.
Next I did strace /sbin/fuser $ORACLE_HOME/bin/oracle and it stopped here:
open("/proc/12854/fdinfo/3", O_RDONLY) = 7
fstat(7, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b99de014000
read(7, "pos:\t0\nflags:\t04002\n", 1024) = 20
close(7) = 0
munmap(0x2b99de014000, 4096) = 0
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
stat("/proc/12857/", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/12857/stat", O_RDONLY) = 4
read(4, "12857 (bash) S 12853 12857 12857"..., 4096) = 243
close(4) = 0
readlink("/proc/12857/cwd", "11.2.0.4/examples (deleted)"..., 4096) = 27
rt_sigaction(SIGALRM, {0x411020, [ALRM], SA_RESTORER|SA_RESTART, 0x327bc30030}, {SIG_DFL, [ALRM], SA_RESTORER|SA_RESTART, 0x327bc30030}, 8) = 0
alarm(15) = 0
write(5, "@\20A\0\0\0\0\0", 8) = 8
write(5, "\20\0\0\0", 4) = 4
write(5, "/proc/12857/cwd\0", 16) = 16
write(5, "\220\0\0\0", 4) = 4
read(6,
It stopped here. So I did Ctrl+C
#
# ps -ef |grep 12857
oracle 12857 12853 0 Apr10 pts/2 00:00:00 -bash
root 21688 2797 0 19:42 pts/8 00:00:00 grep 12857
Killed this process
# kill -9 12857
Again I did strace /sbin/fuser $ORACLE_HOME/bin/oracle and it stopped at a different process this time that was another bash process. I killed that process also.
I executed it for 3rd time: strace /sbin/fuser $ORACLE_HOME/bin/oracle
This time it completed.
Ran it without strace
/sbin/fuser $ORACLE_HOME/bin/oracle
It came out in 1 second.
Then I did the same process for lsof
strace lsof
and killed those processes were it was getting stuck. Eventually lsof also worked.
Pipu retried opatch and it worked fine.
Stale NFS mount was the root cause of this issue. It was stale because the source server was down for Unix security patching during weekend.
No comments:
Post a Comment