To identify which process initiates a reboot, you need to review below are important files
- Clusterware alert log in <GRID_HOME>/log/<nodename>alertnodename
- The cssdagent log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
- The cssdmonitor log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
- The ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
- The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
- IPD/OS or OS Watcher data. IPD/OS is an old name for the Cluster Health Monitor. The names can be used interchaneably although Oracle now calls the tool Cluster Health Monitor
- 'opatch lsinventory -detail' output for the GRID home
- Message files /var/log/message
OCSSD Eviction: 1) Network failure or latencies issue between nodes. It takes 30 consecutive missed checkins to cause a node eviction. 2) Problem writing / reading the voting disk 3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim. If this times out, it could escalate to a node evict.
CSSDAGENT or CSSDMONITOR Eviction: 1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug
Gr8.. Thanks a lot for sharing, Really....
ReplyDeleteVery good article.
ReplyDeleteI have also listed top 4 Reasons for node reboot or node eviction at http://www.dbas-oracle.com/2013/06/Top-4-Reasons-Node-Reboot-Node-Eviction-in-Real-Application-Cluster-RAC-Environment.html
It’s really helfull………… I really like this page so much, so better to keep on posting! Thanks…
ReplyDeleteeviction law firm broward