Oracle RAC, ASM, Exadata, Cloud, and More..: Troubleshooting 11.2 Clusterware Node Evictions (Note 1050693.1)

Starting 11.2.0.2, a node eviction may not actually reboot the machine. This is called a rebootless restart.

To identify which process initiates a reboot, you need to review below are important files

Clusterware alert log in <GRID_HOME>/log/<nodename>alertnodename
The cssdagent log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
The cssdmonitor log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
The ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
IPD/OS or OS Watcher data. IPD/OS is an old name for the Cluster Health Monitor. The names can be used interchaneably although Oracle now calls the tool Cluster Health Monitor
'opatch lsinventory -detail' output for the GRID home
Message files /var/log/message

Common Causes of eviction:

OCSSD Eviction: 1) Network failure or latencies issue between nodes. It takes 30 consecutive missed checkins to cause a node eviction. 2) Problem writing / reading the voting disk 3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim. If this times out, it could escalate to a node evict.

CSSDAGENT or CSSDMONITOR Eviction: 1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug

3 comments:

Ashish ShuklaOctober 10, 2012 at 6:05 AM
Gr8.. Thanks a lot for sharing, Really....
adminJune 5, 2013 at 11:55 PM
Very good article.

I have also listed top 4 Reasons for node reboot or node eviction at http://www.dbas-oracle.com/2013/06/Top-4-Reasons-Node-Reboot-Node-Eviction-in-Real-Application-Cluster-RAC-Environment.html
AnonymousMarch 11, 2014 at 2:33 AM
It’s really helfull………… I really like this page so much, so better to keep on posting! Thanks…
eviction law firm broward

Oracle RAC, ASM, Exadata, Cloud, and More..

Troubleshooting 11.2 Clusterware Node Evictions (Note 1050693.1)

3 comments:

Menu

Blogs & Websites