Working with OCRDUMP

Dump the contents of OCR and count the number of lines.  If you dump the OCR as root, you will see more information than grid/oracle user because information in the OCR is organized by keys that are associated with privileges.  See examples below:

As oracle or grid user:
ocrdump -stdout | wc -l
489

As root:
ocrdump -stdout | wc -l
2993

To dump the OCR content using XML format for the first 50 lines
              ocrdump -stdout -xml | head -50

To dump current ocr
                ocrdump -xml /apps/oracle/dump_current_ocr.xml

Dump the backup contents of a OCR in XML format, then compare it with the current OCR to detect any changes..

              ocrconfig -showbackup
si01.an.com     2010/11/17 11:40:39     /dba/backup00.ocr
si01.an.com     2010/11/17 07:40:37     /dba/backup01.ocr
si01.an.com     2010/11/17 03:40:35     /dba/backup02.ocr
si01.an.com     2010/11/16 03:40:27     /dba/day.ocr

           ocrdump -xml -backupfile /dba/day.ocr previous_day.ocr

 Compare: 
          diff dump_current_ocr.xml previous_day.ocr
3,4c3,5
< <TIMESTAMP>11/17/2011 13:57:05</TIMESTAMP>
< <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml /apps/oracle/dump_current_ocr.xml </COMMAND>
---
> <TIMESTAMP>11/17/2011 14:00:46</TIMESTAMP>
> <DEVICE>/dba/day.ocr</DEVICE>
> <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml -backupfile /dba/day.ocr previous_day.ocr </COMMAND>
879c880
< <VALUE><![CDATA[83]]></VALUE>
---
> <VALUE><![CDATA[75]]></VALUE>

Troubleshooting 11.2 Clusterware Node Evictions (Note 1050693.1)

Starting 11.2.0.2, a node eviction may not actually reboot the machine.  This is called a rebootless restart.

To identify which process initiates a reboot, you need to review below are important files

  • Clusterware alert log in <GRID_HOME>/log/<nodename>alertnodename
  • The cssdagent log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
  • The cssdmonitor log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
  • The ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
  • The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
  • IPD/OS or OS Watcher data.  IPD/OS is an old name for the Cluster Health Monitor.  The names can be used interchaneably although Oracle now calls the tool Cluster Health Monitor
  • 'opatch lsinventory -detail' output for the GRID home
  • Message files /var/log/message
Common Causes of eviction:

OCSSD Eviction: 1) Network failure or latencies issue between nodes.  It takes 30 consecutive missed checkins to cause a node eviction.  2)  Problem writing / reading the voting disk  3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim.  If this times out, it could escalate to a node evict.

CSSDAGENT or CSSDMONITOR Eviction:  1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug

Enable Trace / Debug

Below are several ways to enable tracing and debugging for Oracle RAC resources.

1)  SRVM_TRACE:  Enable Tracing for cluvfy, netca, and srvctl
      export SRVM_TRACE=TRUE

      srvctl config database -d db11g1

You can dynamically enable logging (level 1 to 5)

2)  Enable debug to capture clusterware resource:  crsctl set log res "<resource name">:1"
     crsctl set log res "ora.registry.acfs=1"
     Set Resource ora.registry.acfs Log Level: 1

After you collect all the trace, disable the trace:  crsctl set log res "<resource name >:0"
crsctl set log res "ora.registry.acfs=0"
Set Resource ora.registry.acfs Log Level: 0

3)  You can enable dynamic debugging for CRS, CSS, EVM, and other clusterware sub components.  The  crsctl lsmodules css, crs, evm commands use to list the module's components. 

crsctl lsmodules crs
List CRSD Debug Module: AGENT
List CRSD Debug Module: AGFW
List CRSD Debug Module: CLSFRAME
List CRSD Debug Module: CLSVER
List CRSD Debug Module: CLUCLS
List CRSD Debug Module: COMMCRS
List CRSD Debug Module: COMMNS
List CRSD Debug Module: CRSAPP
List CRSD Debug Module: CRSCCL
List CRSD Debug Module: CRSCEVT
List CRSD Debug Module: CRSCOMM
List CRSD Debug Module: CRSD
List CRSD Debug Module: CRSEVT
List CRSD Debug Module: CRSMAIN
List CRSD Debug Module: CRSOCR
List CRSD Debug Module: CRSPE
List CRSD Debug Module: CRSPLACE
List CRSD Debug Module: CRSRES
List CRSD Debug Module: CRSRPT
List CRSD Debug Module: CRSRTI
List CRSD Debug Module: CRSSE
List CRSD Debug Module: CRSSEC
List CRSD Debug Module: CRSTIMER
List CRSD Debug Module: CRSUI
List CRSD Debug Module: CSSCLNT
List CRSD Debug Module: OCRAPI
List CRSD Debug Module: OCRASM
List CRSD Debug Module: OCRCAC
List CRSD Debug Module: OCRCLI
List CRSD Debug Module: OCRMAS
List CRSD Debug Module: OCRMSG
List CRSD Debug Module: OCROSD
List CRSD Debug Module: OCRRAW
List CRSD Debug Module: OCRSRV
List CRSD Debug Module: OCRUTL
List CRSD Debug Module: SuiteTes
List CRSD Debug Module: UiServer


As root, crsctl set log crs "CRSEVT=1","CRSAPP=1","OCRASM=2"
Set CRSD Module: CRSAPP  Log Level: 1
Set CRSD Module: CRSEVT  Log Level: 1
Set CRSD Module: OCRASM  Log Level: 2

Clusterware logs

Oracle clusterwares store its log files in the following locations:
  • Oracle clusterware alert:  GRID_HOME/log/hostname/alert<nodename>.log
  • CRS logs (Cluster Ready Service):  GRID_HOME/log/hostname/crsd/ .  The crsd.log file is archived every 10MB
  • CSS logs (Cluster Synchonization Service):  GRID_HOME/log/hostname/cssd/.  The cssd.log is archived every 20MB
  • EVM (Event Manager):  GRID_HOME/log/hostname/evmd
  • SRVM (srvctl) and OCR (ocrdump, ocrconfig, ocrcheck) logs:  GRID_HOME/log/hostname/client and ORACLE_HOME/log/hostname/client
  •  diagcollection.pl:  $GRID_HOME/bin/
  • ASM:  GRID_BASE/diag/asm/+asm/+ASMn