Dump the contents of OCR and count the number of lines. If you dump the OCR as root, you will see more information than grid/oracle user because information in the OCR is organized by keys that are associated with privileges. See examples below:
As oracle or grid user:
ocrdump -stdout | wc -l
489
As root:
ocrdump -stdout | wc -l
2993
To dump the OCR content using XML format for the first 50 lines
ocrdump -stdout -xml | head -50
To dump current ocr
ocrdump -xml /apps/oracle/dump_current_ocr.xml
Dump the backup contents of a OCR in XML format, then compare it with the current OCR to detect any changes..
ocrconfig -showbackup
si01.an.com 2010/11/17 11:40:39 /dba/backup00.ocr
si01.an.com 2010/11/17 07:40:37 /dba/backup01.ocr
si01.an.com 2010/11/17 03:40:35 /dba/backup02.ocr
si01.an.com 2010/11/16 03:40:27 /dba/day.ocr
ocrdump -xml -backupfile /dba/day.ocr previous_day.ocr
Compare:
diff dump_current_ocr.xml previous_day.ocr
3,4c3,5
< <TIMESTAMP>11/17/2011 13:57:05</TIMESTAMP>
< <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml /apps/oracle/dump_current_ocr.xml </COMMAND>
---
> <TIMESTAMP>11/17/2011 14:00:46</TIMESTAMP>
> <DEVICE>/dba/day.ocr</DEVICE>
> <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml -backupfile /dba/day.ocr previous_day.ocr </COMMAND>
879c880
< <VALUE><![CDATA[83]]></VALUE>
---
> <VALUE><![CDATA[75]]></VALUE>
Troubleshooting 11.2 Clusterware Node Evictions (Note 1050693.1)
Starting 11.2.0.2, a node eviction may not actually reboot the machine. This is called a rebootless restart.
To identify which process initiates a reboot, you need to review below are important files
OCSSD Eviction: 1) Network failure or latencies issue between nodes. It takes 30 consecutive missed checkins to cause a node eviction. 2) Problem writing / reading the voting disk 3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim. If this times out, it could escalate to a node evict.
CSSDAGENT or CSSDMONITOR Eviction: 1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug
To identify which process initiates a reboot, you need to review below are important files
- Clusterware alert log in <GRID_HOME>/log/<nodename>alertnodename
- The cssdagent log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
- The cssdmonitor log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
- The ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
- The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
- IPD/OS or OS Watcher data. IPD/OS is an old name for the Cluster Health Monitor. The names can be used interchaneably although Oracle now calls the tool Cluster Health Monitor
- 'opatch lsinventory -detail' output for the GRID home
- Message files /var/log/message
OCSSD Eviction: 1) Network failure or latencies issue between nodes. It takes 30 consecutive missed checkins to cause a node eviction. 2) Problem writing / reading the voting disk 3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim. If this times out, it could escalate to a node evict.
CSSDAGENT or CSSDMONITOR Eviction: 1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug
Enable Trace / Debug
Below are several ways to enable tracing and debugging for Oracle RAC resources.
1) SRVM_TRACE: Enable Tracing for cluvfy, netca, and srvctl
srvctl config database -d db11g1
You can dynamically enable logging (level 1 to 5)
2) Enable debug to capture clusterware resource: crsctl set log res "<resource name">:1"
crsctl set log res "ora.registry.acfs=1"
Set Resource ora.registry.acfs Log Level: 1
After you collect all the trace, disable the trace: crsctl set log res "<resource name >:0"
3) You can enable dynamic debugging for CRS, CSS, EVM, and other clusterware sub components. The crsctl lsmodules css, crs, evm commands use to list the module's components.
crsctl lsmodules crs
List CRSD Debug Module: AGENT
List CRSD Debug Module: AGFW
List CRSD Debug Module: CLSFRAME
List CRSD Debug Module: CLSVER
List CRSD Debug Module: CLUCLS
List CRSD Debug Module: COMMCRS
List CRSD Debug Module: COMMNS
List CRSD Debug Module: CRSAPP
List CRSD Debug Module: CRSCCL
List CRSD Debug Module: CRSCEVT
List CRSD Debug Module: CRSCOMM
List CRSD Debug Module: CRSD
List CRSD Debug Module: CRSEVT
List CRSD Debug Module: CRSMAIN
List CRSD Debug Module: CRSOCR
List CRSD Debug Module: CRSPE
List CRSD Debug Module: CRSPLACE
List CRSD Debug Module: CRSRES
List CRSD Debug Module: CRSRPT
List CRSD Debug Module: CRSRTI
List CRSD Debug Module: CRSSE
List CRSD Debug Module: CRSSEC
List CRSD Debug Module: CRSTIMER
List CRSD Debug Module: CRSUI
List CRSD Debug Module: CSSCLNT
List CRSD Debug Module: OCRAPI
List CRSD Debug Module: OCRASM
List CRSD Debug Module: OCRCAC
List CRSD Debug Module: OCRCLI
List CRSD Debug Module: OCRMAS
List CRSD Debug Module: OCRMSG
List CRSD Debug Module: OCROSD
List CRSD Debug Module: OCRRAW
List CRSD Debug Module: OCRSRV
List CRSD Debug Module: OCRUTL
List CRSD Debug Module: SuiteTes
List CRSD Debug Module: UiServer
As root, crsctl set log crs "CRSEVT=1","CRSAPP=1","OCRASM=2"
Set CRSD Module: CRSAPP Log Level: 1
Set CRSD Module: CRSEVT Log Level: 1
Set CRSD Module: OCRASM Log Level: 2
1) SRVM_TRACE: Enable Tracing for cluvfy, netca, and srvctl
export SRVM_TRACE=TRUE
srvctl config database -d db11g1
You can dynamically enable logging (level 1 to 5)
2) Enable debug to capture clusterware resource: crsctl set log res "<resource name">:1"
crsctl set log res "ora.registry.acfs=1"
Set Resource ora.registry.acfs Log Level: 1
After you collect all the trace, disable the trace: crsctl set log res "<resource name >:0"
crsctl set log res "ora.registry.acfs=0"
Set Resource ora.registry.acfs Log Level: 03) You can enable dynamic debugging for CRS, CSS, EVM, and other clusterware sub components. The crsctl lsmodules css, crs, evm commands use to list the module's components.
crsctl lsmodules crs
List CRSD Debug Module: AGENT
List CRSD Debug Module: AGFW
List CRSD Debug Module: CLSFRAME
List CRSD Debug Module: CLSVER
List CRSD Debug Module: CLUCLS
List CRSD Debug Module: COMMCRS
List CRSD Debug Module: COMMNS
List CRSD Debug Module: CRSAPP
List CRSD Debug Module: CRSCCL
List CRSD Debug Module: CRSCEVT
List CRSD Debug Module: CRSCOMM
List CRSD Debug Module: CRSD
List CRSD Debug Module: CRSEVT
List CRSD Debug Module: CRSMAIN
List CRSD Debug Module: CRSOCR
List CRSD Debug Module: CRSPE
List CRSD Debug Module: CRSPLACE
List CRSD Debug Module: CRSRES
List CRSD Debug Module: CRSRPT
List CRSD Debug Module: CRSRTI
List CRSD Debug Module: CRSSE
List CRSD Debug Module: CRSSEC
List CRSD Debug Module: CRSTIMER
List CRSD Debug Module: CRSUI
List CRSD Debug Module: CSSCLNT
List CRSD Debug Module: OCRAPI
List CRSD Debug Module: OCRASM
List CRSD Debug Module: OCRCAC
List CRSD Debug Module: OCRCLI
List CRSD Debug Module: OCRMAS
List CRSD Debug Module: OCRMSG
List CRSD Debug Module: OCROSD
List CRSD Debug Module: OCRRAW
List CRSD Debug Module: OCRSRV
List CRSD Debug Module: OCRUTL
List CRSD Debug Module: SuiteTes
List CRSD Debug Module: UiServer
As root, crsctl set log crs "CRSEVT=1","CRSAPP=1","OCRASM=2"
Set CRSD Module: CRSAPP Log Level: 1
Set CRSD Module: CRSEVT Log Level: 1
Set CRSD Module: OCRASM Log Level: 2
Clusterware logs
Oracle clusterwares store its log files in the following locations:
- Oracle clusterware alert: GRID_HOME/log/hostname/alert<nodename>.log
- CRS logs (Cluster Ready Service): GRID_HOME/log/hostname/crsd/ . The crsd.log file is archived every 10MB
- CSS logs (Cluster Synchonization Service): GRID_HOME/log/hostname/cssd/. The cssd.log is archived every 20MB
- EVM (Event Manager): GRID_HOME/log/hostname/evmd
- SRVM (srvctl) and OCR (ocrdump, ocrconfig, ocrcheck) logs: GRID_HOME/log/hostname/client and ORACLE_HOME/log/hostname/client
- diagcollection.pl: $GRID_HOME/bin/
- ASM: GRID_BASE/diag/asm/+asm/+ASMn
Subscribe to:
Posts (Atom)