Oracle RAC, ASM, Exadata, Cloud, and More..: February 2011

Viewing the ASM Header

There are a lot of meta-information in each ASM disk’s header. You can see the information using the kfed utility. Below is the example with cooresponding information.

Although kfed can be used to write a modified header back into the disk header, you should be very careful to use this tool (under the Oracle Support’s supervision).

+ASM2 > ls -ltr /dev/oracleasm/disks*
total 0
brw-rw---- 1 oracle dba 120, 17 Feb 14 12:37 OCR_VOTE_DISK01
brw-rw---- 1 oracle dba 120, 33 Feb 14 12:37 OCR_VOTE_DISK02
…
…
brw-rw---- 1 oracle dba 120, 241 Feb 14 12:37 ORADATA_DD501_DISK10
brw-rw---- 1 oracle dba 120, 257 Feb 14 12:37 ORADATA_DD501_DISK11
brw-rw---- 1 oracle dba 120, 273 Feb 14 12:37 ORADATA_DD501_DISK12
brw-rw---- 1 oracle dba 120, 289 Feb 14 12:37 ORADATA_DD501_DISK17
+ASM2 > kfed read /dev/oracleasm/disks/ORADATA_DD501_DISK10
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483653 ; 0x008: TYPE=0x8 NUMB=0x5
kfbh.check:                 3567150782 ; 0x00c: 0xd49e66be
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:ORCLDISKORADATA_DD501_DISK10 ; 0x000: length=28 =>The ASMLib provider string
kfdhdb.driver.reserved[0]:   1145131599 ; 0x008: 0x4441524f
kfdhdb.driver.reserved[1]:   1598116929 ; 0x00c: 0x5f415441
kfdhdb.driver.reserved[2]:    808797252 ; 0x010: 0x30354444
kfdhdb.driver.reserved[3]:   1229217585 ; 0x014: 0x49445f31
kfdhdb.driver.reserved[4]:    808536915 ; 0x018: 0x30314b53
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        5 ; 0x024: 0x0005
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL => External redundancy diskgroup
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER=> Member of a diskgroup
kfdhdb.dskname:    ORADATA_DD501_DISK10 ; 0x028: length=20 =>The ASM disk name
kfdhdb.grpname:            DG_TRX_DD501 ; 0x048: length=12 =>The group name
kfdhdb.fgname:     ORADATA_DD501_DISK10 ; 0x068: length=20
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:   32939703 ; 0x0a8: HOUR=0x17 DAYS=0x15 MNTH=0x7 YEAR=0x7da => creation timestamp
kfdhdb.crestmp.lo: 3277276160 ; 0x0ac: USEC=0x0 MSEC=0x1d1 SECS=0x35 MINS=0x30
kfdhdb.mntstmp.hi: 32950963 ; 0x0b0: HOUR=0x13 DAYS=0x15 MNTH=0x2 YEAR=0x7dbèTimestamp when the disk was mounted
kfdhdb.mntstmp.lo:           1680539648 ; 0x0b4: USEC=0x0 MSEC=0x2c0 SECS=0x2 MINS=0x19
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200=>512 byte sector size
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000=> Block size
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000=> 1M allocation units
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  131069 ; 0x0c4: 0x0001fffd =>Disk size (OS)
kfdhdb.pmcnt:                         3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000 => Compatible.RDBMS
kfdhdb.grpstmp.hi:             32939703 ; 0x0e4: HOUR=0x17 DAYS=0x15 MNTH=0x7 YEAR=0x7da
kfdhdb.grpstmp.lo:           2505329664 ; 0x0e8: USEC=0x0 MSEC=0x113 SECS=0x15 MINS=0x25
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000
kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000
kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000
kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000
kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000
kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000
kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000
kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000
kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

Administering ASM

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS
------------------------------ ----- ----- -----
SYS                            TRUE TRUE FALSE
SANADMIN                       FALSE FALSE TRUE

col instance_name format a23

select db_name, status, instance_name from v$asm_client
DB_NAME STATUS       INSTANCE_NAME
-------- ------------ -----------------------
+ASM     CONNECTED    +ASM2
+ASM     CONNECTED    +ASM2
asmvol   CONNECTED    +ASM2

col volume_device format a32
col mountpath format a23
col usage format a9
set linesize 132

To check a ASM cluster file system (ACFS)

select volume_name, state, usage, volume_device, mountpath
from v$asm_volume

VOLUME_NAME                    STATE    USAGE     VOLUME_DEVICE                    MOUNTPATH
------------------------------ -------- --------- -------------------------------- -----------------------
DATALOGS                       ENABLED ACFS      /dev/asm/datalogs-94             /data/oracle/logfiles

+ASM2 > acfsutil info fs
/data/oracle/logfiles
    ACFS Version: 11.2.0.1.0.0
    flags:        MountPoint,Available
    mount time:   Mon Feb 21 12:37:06 2011
    volumes:      1
    total size:   10737418240
    total free:   10581868544
    primary volume: /dev/asm/datalogs-94
        label:
        flags:                 Primary,Available,ADVM
        on-disk version:       39.0
        allocation unit:       4096
        major, minor:          252, 48129
        size:                  10737418240
        free:                  10581868544
        ADVM diskgroup         DG_DBA_DF501
        ADVM resize increment: 268435456
        ADVM redundancy:       unprotected
        ADVM stripe columns:   4
        ADVM stripe width:     131072
    number of snapshots: 0
    snapshot space usage: 0




SQL> select name, state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
GRID_OCRVOTE                   MOUNTED
OCR_VOTE                       MOUNTED
DG_DBA_DD501                   MOUNTED
DG_TRX_DD501                   MOUNTED
DG_DBA_DF501                   MOUNTED

You can use srvctl to mount or dismount an ASM diskgroup.
+ASM2 > srvctl start diskgroup -g DG_DBA_DD501 -n lltcind02
+ASM2 > srvctl stop diskgroup -g DG_TRX_DD501 -n lltcind01

To check integrity of ASM diskgroup

SQL> alter diskgroup OCR_VOTE check;

Diskgroup altered.

If there is a problem, it will show in the alert.log. If you specify the REPAIRT keywork, ASM will tryp to address any reported problem.

adrci> set homepath diag/asm/+asm/+ASM2
adrci> show alert -tail -f

NOTE: starting check of diskgroup OCR_VOTE
kfdp_checkDsk(): 19
kfdp_checkDsk(): 20
kfdp_checkDsk(): 21
2011-02-21 19:33:09.339000 -06:00
SUCCESS: check of diskgroup OCR_VOTE found no errors
SUCCESS: alter diskgroup OCR_VOTE check

To see spfile from ASM using asmcmd

+ASM2 > asmcmd ls -l +GRID_OCRVOTE/racpoc/asmparameterfile/spfileasm.ora
Type Redund Striped Time Sys Name
N spfileasm.ora => +GRID_OCRVOTE/racpoc/ASMPARAMETERFILE/REGISTRY.253.726177105

To check permission:
ASMCMD> ls --permission
User Group Permission Name
                         ASMPARAMETERFILE/
                         OCRFILE/

To be continued...

Compressed v.s Non-compressed RMAN Backup

Recently I performed several tests to compare the RMAN compressed v.s non-compressed backup and see how much disk space v.s duration ratios. In my test cases, I backed up two tablespaces. One tablespace ENDORSE_008 contain text data, and ENDORSE_IMG contains images/PDF documents with CLOB data type. I found the following interesting things:

ENDORSE_008 --Data / text tablespace: Based on the test case below, if your database does not have a lot of BLOG/CLOB data, it may be a good idea to backup the database using compressed backup set. It save a lot of disk space as the ratio is very high (at least 1:8) compare with the time it takes to backup with non-compressed mode.

Non-compression                                   Compression
Tablespace: ENDORSE_008
Size: 4GB                                                   4GB
Backup size: 4GB                                   500 MB (about 1:8 ratio disk space)
Duration: 25 seconds                          36 seconds

RMAN> run {
2> allocate channel d1 type disk;
3> allocate channel d2 type disk;
4> allocate channel d3 type disk;
5> allocate channel d4 type disk;
6> backup tablespace ENDORSE_008 format '/u01/oracle/admin/ENDORSE/bkups/ENDORSE_ro_endorse_xml_q42008_%s_%p';
7> release channel d1;
8> release channel d2;
9> release channel d3;
10> release channel d4;
11> }

Some notes about 11gR2 Grid Infrastructure

In 11gR2, crsctl has been exetended to include "cluster-aware commands".

TRXQAC2 > crsctl check cluster -all
**************************************************************
lltcind01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
lltcind02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
lltcind02.fnf.com:/apps/oracle

You need to be root to stop / start cluster, or you need to sudo it.

sudo -u root crsctl stop cluster -all

sudo -u root crsctl start cluster -all

To view current status for all resouces:

crsctl status res -t

crsctl status resouce command doesn't list the daemons of the high availability services stack. You need to include -init (initially undocumented) to see it.

crsctl stat res -t -init

--------------------------------------------------------------------------------
NAME           TARGET STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE ONLINE       lltcind02                Started
ora.crsd
      1        ONLINE ONLINE       lltcind02
ora.cssd
      1        ONLINE ONLINE       lltcind02
ora.cssdmonitor
      1        ONLINE ONLINE       lltcind02
ora.ctssd
      1        ONLINE ONLINE       lltcind02                OBSERVER
ora.diskmon
      1        ONLINE ONLINE       lltcind02
ora.drivers.acfs
      1        ONLINE ONLINE       lltcind02
ora.evmd
      1        ONLINE ONLINE       lltcind02
ora.gipcd
      1        ONLINE ONLINE       lltcind02
ora.gpnpd
      1        ONLINE ONLINE       lltcind02
ora.mdnsd
      1        ONLINE ONLINE       lltcind02

To give oracle the ability to startup a resource:

+ASM2 > sudo -u root crsctl setperm resource ora.trxqac.qac11g_rac.svc -o oracle
lltcind02.fnf.com:/apps/oracle

To check resource permission:

+ASM2 > crsctl getperm resource ora.trxqac.qac11g_rac.svc
Name: ora.trxqac.qac11g_rac.svc
owner:oracle:rwx,pgrp:oinstall:rwx,other::r--


to be continued..

Oracle Local Registry - OLR

OLR is the new feature in Oracle 11gR2 grid infrastructure. The information stored in the OLR is used by the oracle high availability service daemon (OHASD) . It includes data about the clusterware configuration, version information, and GpnP wallets. Comparing to the OCR, OLR has fewer keys and majority of OLR keys are deal with OHASD process whereas OCR deals with CRSD process.

Troubleshooting Clusterware

1) Make sure your nodes have exactly the same system time. the best recommendation is sync nodes using Network Time Protocol (NTP) by modifying the NFTP initialization file with –x flag

vi /etc/sysconfig/ntpd

OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

2) Run diagnostics Collection script:

./diagcollection.pl –collect

3) Run cluster verify to verify Oracle grid installation and RAC installation, configuration, and operation.

lltcind01.fnf.com{+ASM1}/apps/oracle/product/11.2.0/grid/bin> cluvfy comp -list

USAGE:
cluvfy comp <component-name> <component-specific options> [-verbose]

Valid components are:
        nodereach : checks reachability between nodes
        nodecon   : checks node connectivity
        cfs       : checks CFS integrity
        ssa       : checks shared storage accessibility
        space     : checks space availability
        sys       : checks minimum system requirements
        clu       : checks cluster integrity
        clumgr    : checks cluster manager integrity
        ocr       : checks OCR integrity
        olr       : checks OLR integrity
        ha        : checks HA integrity
        crs       : checks CRS integrity
        nodeapp   : checks node applications existence
        admprv    : checks administrative privileges
       peer      : compares properties with peers
        software : checks software distribution
        asm       : checks ASM integrity
        acfs       : checks ACFS integrity
        gpnp      : checks GPnP integrity
        gns       : checks GNS integrity
        scan      : checks SCAN configuration
        ohasd     : checks OHASD integrity
        clocksync      : checks Clock Synchronization
        vdisk      : check Voting Disk Udev settings

Example:

/u01/app/oracle/product/11.2.0/grid/bin> cluvfy comp crs -n all -verbose

Verifying CRS integrity

Checking CRS integrity...
The Oracle clusterware is healthy on node "lltcind02"
The Oracle clusterware is healthy on node "lltcind01"

CRS integrity check passed

Verification of CRS integrity was successful.

/u01/app/oracle/product/11.2.0/grid/bin> cluvfy stage -list

USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options> [-verbose]

Valid stage options and stage names are:
        -post hwos    : post-check for hardware and operating system
        -pre cfs     : pre-check for CFS setup
        -post cfs     : post-check for CFS setup
        -pre crsinst : pre-check for CRS installation
        -post crsinst : post-check for CRS installation
        -pre hacfg   : pre-check for HA configuration
        -post hacfg   : post-check for HA configuration
        -pre dbinst : pre-check for database installation
        -pre acfscfg : pre-check for ACFS Configuration.
        -post acfscfg : post-check for ACFS Configuration.
        -pre dbcfg   : pre-check for database configuration
        -pre nodeadd : pre-check for node addition.
        -post nodeadd : post-check for node addition.
        -post nodedel : post-check for node deletion.

4) Enable Resouce Debugging to turn on /off tracing

sudo -u root crsctl set log res "ora.lltcind01.vip:1"

sudo] password for oracle:
Set Resource ora.lltcind01.vip Log Level: 1

sudo -u root crsctl set log res "ora.lltcind01.vip:0"

Set Resource ora.lltcind01.vip Log Level: 0

5) Use SRVM_TRACE=TRUE for srvctl, cluvfy, netca, dbca, dbua

srvctl config database –d TEST –a

6) Check the following log files when node eviction occurs

$ORACLE_GIRD/log/host_name/cssd/ocssd.log. You need to look for “Begin Dump” or End Dump” just before the reboot.

$ORACLE_GRID/log/host_name/client/oclskd.log

7) Set diagwait value

crsctl set css diagwait 13 –force

8) Avoid false reboots

crsctl get css miscount (to determine current setting. Misccount must > (Timeout + Margin_ and > than diagwait. Default of 30s recommended, do not change the value of miscount or disk timeout unless Oracle Support recommends doing so)

9) ocrdump

Add a node to a cluster

To add a node in an Oracle clusterware environment, you can clone a copy images of an Oracle clusterware installation, use Grid Control, or invoke addNode.sh. In general, you install OS, setup network and storage exactly like the current node. The summary tasks are:

Check system and network requirements
Install the required OS pakages
Set kernel parameters
Create groups, users, and required directories
Setup SSH, enable user equivalency, and install owner's shell limits
Run cluster verify Utility

Perform post hardware and OS check

cluvfy stage -post hwos -n dba0103

Perform a detailed properties comparision of the current node to the new node

cluvfy comp peer -refnode dba0102 -n dba0103 -orainv orainstall -osdba asmdba -verbose

To add a node:

./addNode.sh -silent "CLUSTER_NEW_NODES=dba0103" "CLUSTER_NEW_VIRTUAL_HOSTNAMES=dba0103-vip"

To check integrity on the cluster

cluvfy stage -post nodeadd -n dba0103 -verbose

Redhat - quick reference commands

/boot/grub/grub.conf (boot prompt)

/var/log/dmesg (kernel messgages)

/etc/inittab (id:5:initdefault)

/etc/rc.d/rc.sysinit (system initialization)

ls -ltr /etc/rc.d/init.d (scripts initizliae in /etc/inittab)

lsmod | grep oracle (kernel module for oracle)

/proc: is a virtual filesystem that provides detailed information about the kernel, hardware and running processes. Some interesting /proc entries:

/proc/PID

/proc/cpuinfo

/proc/partitions

/proc/meminfo

/proc/dmstat

/proc/swaps

/proc/mounts

/proc/net

/proc/sys/kernel/hostname

/proc/sys/vm/swappiness (indicate how aggressive memory will be swapped out to the sap devices)

/etc/sysctl.conf , sysctl –a (to view kernel parameters) –w (to set) –p (to sync)

netstat –tanp

to be continued...

Administering Oracle 11gR2 Clusterware

Command	Description
pgrep -l d.bin	To verify the Oracle clusterware daemon running
crsctl check crs	Check crs
crsctl stat res –t	Verify status
sudo -u root crsctl stop crs	Stop Oracle clusterware
sudo -u root crsctl start crs	Start Oracle Clusterware
crsctl query css votedisk	Determine the location of vote disk
ocrcheck	Determine the location of OCR
ocrcheck -local	Determine the location of OLR
cat /etc/oracle/ocr.loc	Exam the contents of the ocr.loc
crsctl stat res ora.eons –t	Verify status of a resouce
ocrconfig –showbackup	List the automatic backup of OCR
ocrconfig -showbackup manual	Show backup manually

ocrconfig –manualbackup	Manual backup
ocrconfig –export /data/bkup/ocr.backup	Perform a logical backup
ocrconfig –local –export /data/bkup/olr.backup	Perform a logical backup of the OLR
crsctl status server –f	Check the status of servers

To be continued...

Grid, GNS, SCAN

Oracle Grid Infrastructure
ASM and clusterware are installed into a single home directory called Oracle Grid Infrastruture

Interconnect NIC Guidelines

Failure to conrrectly configure the network interface cards and switches used for the interconnect results in sever performance degradation and possible node eviction. Consider the following guidelines:

· Configure the interconnect NIC on the fastest PCI bus

· Ensure that NIC names and slots are identical on all nodes

· Define flow control: receive=on, transmit=off

· For a better high available(HA) design would be a implement a redundant switch

strategy.

· Define full bit rate supported by NIC

· Define full duplext autonegotiate

· Ensure compatible switch settings:

If 802.3ad is used on NIC, it must be used and supported on the switch
The Maximum Transmission Unit(MTU) should be the same between NIC and the switch

· UPS socket buffer: Default settings are adequate. Increase the allocated buffer when MTU size has increased, netstat command reports errors, ifconfig command reports dropped packets or overfolow

· Jumbo frames are not an IEE standard and need to test carefully

Clusterware Startup

· Oracle Clusterware is started by the OS init daemon

h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

· Oracle clusterware installation modifies the /etc/inittab file to restart ohasd in case of a crash

GNS – Grid Naming Service
GNS assumes that there is a DHCP server running on the public network with enough addresses to assign to the VIPs and single client access name (SCAN) vips. With GNS, only one statistic IP address is required for the cluster, the GNS VIPs.

cat /etc/named.conf
cluster01.example.com # cluster sub-doman# NS cluster01-gns.example.com
cluster01-gns.example.com 192.0.3.154 # cluster GNS address

A request to resolve cluster01-gns.example.com would be forwarded to GNS on 192.0.3.154
Each node in the cluster runs a multicast DNS (Mdns) process.

SCAN – Single Client Access Name
The single client access name (SCAN) is the address used by clients connecting to the cluetr
The SCAN is a domain name registered to three IP addresses, either in the DNS or the GNS
The SCAN provides a stable, highly available name for clients to use, independent of the nodes that make up the cluster.