Oracle RAC, ASM, Exadata, Cloud, and More..: Troubleshooting Clusterware

1) Make sure your nodes have exactly the same system time. the best recommendation is sync nodes using Network Time Protocol (NTP) by modifying the NFTP initialization file with –x flag

vi /etc/sysconfig/ntpd

OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

2) Run diagnostics Collection script:

./diagcollection.pl –collect

3) Run cluster verify to verify Oracle grid installation and RAC installation, configuration, and operation.

lltcind01.fnf.com{+ASM1}/apps/oracle/product/11.2.0/grid/bin> cluvfy comp -list

USAGE:
cluvfy comp <component-name> <component-specific options> [-verbose]

Valid components are:
        nodereach : checks reachability between nodes
        nodecon   : checks node connectivity
        cfs       : checks CFS integrity
        ssa       : checks shared storage accessibility
        space     : checks space availability
        sys       : checks minimum system requirements
        clu       : checks cluster integrity
        clumgr    : checks cluster manager integrity
        ocr       : checks OCR integrity
        olr       : checks OLR integrity
        ha        : checks HA integrity
        crs       : checks CRS integrity
        nodeapp   : checks node applications existence
        admprv    : checks administrative privileges
       peer      : compares properties with peers
        software : checks software distribution
        asm       : checks ASM integrity
        acfs       : checks ACFS integrity
        gpnp      : checks GPnP integrity
        gns       : checks GNS integrity
        scan      : checks SCAN configuration
        ohasd     : checks OHASD integrity
        clocksync      : checks Clock Synchronization
        vdisk      : check Voting Disk Udev settings

Example:

/u01/app/oracle/product/11.2.0/grid/bin> cluvfy comp crs -n all -verbose

Verifying CRS integrity

Checking CRS integrity...
The Oracle clusterware is healthy on node "lltcind02"
The Oracle clusterware is healthy on node "lltcind01"

CRS integrity check passed

Verification of CRS integrity was successful.

/u01/app/oracle/product/11.2.0/grid/bin> cluvfy stage -list

USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options> [-verbose]

Valid stage options and stage names are:
        -post hwos    : post-check for hardware and operating system
        -pre cfs     : pre-check for CFS setup
        -post cfs     : post-check for CFS setup
        -pre crsinst : pre-check for CRS installation
        -post crsinst : post-check for CRS installation
        -pre hacfg   : pre-check for HA configuration
        -post hacfg   : post-check for HA configuration
        -pre dbinst : pre-check for database installation
        -pre acfscfg : pre-check for ACFS Configuration.
        -post acfscfg : post-check for ACFS Configuration.
        -pre dbcfg   : pre-check for database configuration
        -pre nodeadd : pre-check for node addition.
        -post nodeadd : post-check for node addition.
        -post nodedel : post-check for node deletion.

4) Enable Resouce Debugging to turn on /off tracing

sudo -u root crsctl set log res "ora.lltcind01.vip:1"

sudo] password for oracle:
Set Resource ora.lltcind01.vip Log Level: 1

sudo -u root crsctl set log res "ora.lltcind01.vip:0"

Set Resource ora.lltcind01.vip Log Level: 0

5) Use SRVM_TRACE=TRUE for srvctl, cluvfy, netca, dbca, dbua

srvctl config database –d TEST –a

6) Check the following log files when node eviction occurs

$ORACLE_GIRD/log/host_name/cssd/ocssd.log. You need to look for “Begin Dump” or End Dump” just before the reboot.

$ORACLE_GRID/log/host_name/client/oclskd.log

7) Set diagwait value

crsctl set css diagwait 13 –force

8) Avoid false reboots

crsctl get css miscount (to determine current setting. Misccount must > (Timeout + Margin_ and > than diagwait. Default of 30s recommended, do not change the value of miscount or disk timeout unless Oracle Support recommends doing so)

9) ocrdump

Oracle RAC, ASM, Exadata, Cloud, and More..

Troubleshooting Clusterware

No comments:

Post a Comment

Menu

Blogs & Websites