Oracle RAC, ASM, Exadata, Cloud, and More..: 2011

Prerequisite Steps for Additing cluster Nodes

Below are the steps to prepare a node to be added to the cluster:

1. Server build out and establish physical connection: Build out servers, OS, confiure network, interconnect, share disks, etc..

2. Install OS: Clone a image of OS that matches with other nodes in your cluster including patches updates, drivers, etc..

3. Create Oracle users and groups: Make sure all matches up

4. Setup SSH

Use cluvfy to validate the hardware and OS installation.

cluvfy stage -post hwos -n all -verbose
cluvfy comp peer -refnode RAC01 -n RAC08,RAC09 -orainv oinstall -osdba asmdba -verbose
cluvfy stage -pre nodeadd -n RAC08,RAC09 -fixup

TAF - Transparent Application Failover

TAF is a feature of OCI. It allows applications to reconnect if the initial connection fails. TAF can resume the SELECT operation but all active transactions are rolled back. TAF supports 2 types of failover methods:

1. BASIC: Reconnection is established at failover time.

TAF without FAN (fast application notification)
srvctl add service -d DBALAB -s DBSRV -r dbsrv01,dbsrv02 -P BASIC

TAF with FAN:
srvctl add service -d DBALAB -s DBSRV -r dbsrv01,dbsrv02 -P BASIC
srvctl start service -d DBALAB -s DBSRV
srvctl modify service -d DBALAB -s DBSRV -q TRUE -P BASIC -e SELECT -z 180 -w 5 -j LONG

2. PRECONNECT: In addition to the intial connection, the shadow connection is also created.

Note about Services and Load Balancing Advisory

There are two types of load balancing that you can implement:

1. Client-side: Balance the connections across the listeners
2. Server-side: The listener directs the connection requests to the best instance based on the load balance advisory (LBA).

You can use LONG or SHORT load balancing methods. Just like its name, LONG is suitable for long-lived connections like connection pools and SQL*Forms. SHORT is suitable for short-lived connections like ORDER service type..

ex: srvctl modify service -d labbdr -s labbpoc -q TRUE -P BASIC -e SELECT -z 180 -w 5 -j LONG
(e: failover method, z: number of retries, w: delay between, j: load balancing goal

Load Balancing Advisory (LBA): Advisory to send works across instance and is available for applications JDBC, ODP. There are two types of service level goals:

1. SERVICE_TIME: based on the elapse time for work to be done where the rate of demands change. Open orders, internet shopping type of workloads

ex: srvctl modify service -d DBAPOC -s DBA_RAC -B SERVICE_TIME -j SHORT

2. THROUGHPUT: direct works according to throughput and suitable for workload like batch processing..
ex: srvctl modify service -d DBAPOC -s dbatplt_rac -B THROUGHPUT -j LONG

Monitor service
select machine, failover_method, failover_type, failed_over, service_name, count(*) from v$session
group by machine, failover_method, failover_type, failed_over, service_name

Create Server Pool with crsctl or srvctl

srvctl add srvpool -h
Adds a server pool to the Oracle Clusterware.

Usage: srvctl add srvpool -g <pool_name> [-l <min>] [-u <max>] [-i <importance>] [-n "<server_list>"] [-f]
      -g <pool_name>           Server pool name
      -l <min>                 Minimum size of the server pool (Default value is 0)
      -u <max>                 Maximum size of the server pool (Default value is -1 for unlimited maximum size)
      -i <importance>          Importance of the server pool (Default value is 0)
      -n "<server_list>"       Comma separated list of candidate server names
      -f                       Force the operation even though some resource(s) will be stopped
    -h                       Print usage


crsctl add serverpool -h
Usage:
   crsctl add serverpool <spName> [[-file <filePath>] | [-attr "<attrName>=<value>[,...]"]] [-i]
where
      spName          Add named server pool
      filePath        Attribute file
      attrName        Attribute name
      value           Attribute value
      -i              Fail if request cannot be processed immediately
     -f              Force option


   srvctl add srvpool -g ashley_sp -l 2 -u 5 -i 3 -n "ljtcax101, ljtbab102"

crsctl status serverpool -p

NAME=ora.ashley_sp
IMPORTANCE=3
MIN_SIZE=2
MAX_SIZE=5
SERVER_NAMES=ljtcax101, ljtbab102
PARENT_POOLS=
EXCLUSIVE_POOLS=
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--

To modify server pool
srvctl modify serverpool ashley_sp -attr "MIN_SIZE=1,MAX_SIZE=3, IMPORTANCE=7"

To delete remove server pool
srvctl remove srvpool -g SP1

To check permission

crsctl getperm serverpool ora.ashley_sp
Name: ora.ashley_sp
owner:oracle:rwx,pgrp:oinstall:rwx,other::r--

Top 5 Database and/or Instance Performance Issues in RAC Environment

Check out metalink documentation # 1373500.1.   It helps to identify the top database / instance performance problems and resolutions in Oracle RAC enviornment.

Issue #1: High counts of lost blocks(gc lost blocks, gc current/cr lost blocks)

Symptoms
I. AWR reports show high number of lost blocks.
II. netstat -s reports increasing number of packet reassambly failure, dropped packets.
Solutions
Use the following document to troubleshoot and resolve lost block issue. The document describes symptoms, possible causes and solutions.
Document 563566.1 - gc block lost diagnostics

Issue #2: High log file sync waitsSymptoms
I. AWR reports show that log file sync is one of the TOP 5 waits consistently.
II. Average log file sync is high ( > 20 ms).
III. Average log file parallel write is high ( > 10 ms).
III. Average redo write broadcast ack time or wait for scn ack is high ( > 10 ms).
IV. Average log file sync is low, but there are too many log file sync waits.
Background
When a user session COMMITs or ROLLBACKs, the session's redo information needs to be flushed by LGWR to the redo logfile. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm that all redo changes are safely on disk.
Example:
WAIT #0: nam='log file sync' ela= 977744 buffer#=754 p2=0 p3=0 obj#=114927 tim=1212384670107387
Parameters:
P1 = buffer#
P2 = Not used
P3 = Not used
obj# = object_id
All changes up to this buffer# (in the redo log buffer) must be flushed to disk and the writes confirmed to ensure that the transaction is committed, and will remain committed upon an instance crash.
A typical life cycle of 'log file sync' wait
1. A user sessions issues a commit or rollback and starts waiting on log file sync.
2. LGWR gather redo information to be written to redo log files, issues IO and queues BOC to an LMS process and posts LMS process.
3. LGWR waits for redo buffers to be flushed to disk and SCN to be ACK'd
4. Completion of IO and receiving ACK from LMS signal write complete. LGWR then posts foreground process to continue.
5. Foreground process wakes up and log file sync wait ends.
Important log file sync related statistics and events
redo write time - Total elapsed time of the write from the redo log buffer to the current redo log file in microseconds.
redo writes - Total number of writes by LGWR to the redo log files. "redo blocks written" divided by this statistic equals the number of blocks per write.
log file parallel write - Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.
redo write broadcast ack time - Total amount of the latency associated with broadcast on commit beyond the latency of the log write (in microseconds).
user commits - Number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.
user rollbacks - Number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions.
The script provided in Document 1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) collects useful information for troubleshooting log file sync issues.
Possible Causes
I. Poor IO service time and throughput of the devices where redo logs reside.
II. Oracle bugs. Please review WAITEVENT: "log file sync" Reference (Document 34592.1) for known oracle bugs.
III. LMS not running in RT class.
IV. Scheduling delays with LGWR process.
V. Excessive number of commits.
VI. OS resource starvation.
Solutions
I. Move redo log files to disks with higher throughput and better response time if log file parallel write is consistently high ( > 10 ms). log file parallel write should ideally be within 1-2 ms range
II. Apply fixes for the known oracle bugs, which are applicable to your environment. The most effective way to get those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
III. Ensure that LMS processes are running in RT class. LMS processes run in RT class by default.
IV. Renice LGWR to run at higher priority or run LGWR in RT class by adding LGWR to the parameter: _high_priority_processes='VKTM|LMS|LGWR"
V. Reduce number of commits by using batch commits instead of commiting after every DML operation.
VI. See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
VII. See if COMMIT NOWAIT option can be used. Please refer to Document 857576.1 for more information.

Issue #3: High Waits on MutexesMutexes are a lighter-weight and more granular concurrency mechanism than latches.The reason for obtaining a mutex is to ensure that certain operations are properly managed for concurrency. e.g., if one session is changing a data structure in memory, then other session must wait to acquire the mutex before it can make a similar change. The following are most common mutex related waits:
A. cursor: pin S wait on X
B. cursor: pin S
C. library cache: Mutex X
Symptoms (A)
AWR reports show cursor: pin S wait on X as one of the top wait.
Background (A)
A session may wait for this event when it is trying to get a mutex pin in Share mode but another session is holding the mutex pin on the same cursor object in exclusive. Frequently, waits for 'Cursor: pin S wait on X' is a symptom and not the cause. There may be underlying tuning requirements or known issues.
Possible Causes (A)
Please review Troubleshooting 'cursor: pin S wait on X' waits (Document 1349387.1).
Solutions (A)
Please review Troubleshooting 'cursor: pin S wait on X' waits (Document 1349387.1).
Symptoms (B)
AWR reports show cursor: pin S as one of the top waits
Background (B)
A session waits for "cursor: pin S" when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. This may seem a little strange as one might question why there should be any form of wait to get a mutex which has no session holding it in an incompatible mode. The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself. If there are concurrent sessions trying to make such an update to the mutex then only one session can actually increment (or decrement) the reference count at a time. A wait on "cursor: pin S" thus occurs if a session cannot make that atomic change immediately due to other concurrent requests.
Mutexes are local to the current instance in RAC environments.
Parameters:
   P1 = idn
   P2 = value
   P3 = where (where|sleeps in 10.2)
idn is the mutex identifier value which matches to the HASH_VALUE of the SQL statement that we are waiting to get the mutex on. The SQL can usually be found using the IDN value in a query of the form:
SELECT sql_id, sql_text, version_count
FROM V$SQLAREA where HASH_VALUE=&IDN;
If the SQL_TEXT shown for the cursor is of the form "table_x_x_x_x" then this is a special internal cursor - see Document 1298471.1 for information about mapping such cursors to objects.
P1RAW is the same value in hexadecimal and it can be used to search in tracefiles for SQL matching to that hash value.
Possible Causes (B)
I. Heavy concurrency for a specific mutex, especially on systems with multiple CPUs.
II. Waits for very many different "idn" values when under load.
III. Oracle Bugs
      Bug 10270888 - ORA-600[kgxEndExamine-Bad-State] / mutex waits after a self deadlock
      Bug 9591812 - Wrong wait events in 11.2 ("cursor: mutex S" instead of "cursor: mutex X")
      Bug 9499302 - Improve concurrent mutex request handling
      Bug 7441165 - Prevent preemption while holding a mutex (fix only works on Solaris)
      Bug 8575528 - Missing entries in V$MUTEX_SLEEP.location
Solutions (B)
I. Apply fix for Bug 10411618.
II. For any identified "hot" SQLs, one can reduce the concurrency on specific cursors by replacing the one SQL with some variants which are executed by different sessions. Please review WAITEVENT: cursor: pin S Reference (Document 1310764.1) for further details.
III. Apply fixes for other known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1 has more information on recommended patches.
Symptoms (C)
AWR reports show library cache: Mutex X as one of the TOP waits.
Background (C)
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes. This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released.
Individual Waits:
Parameters:
P1 = "idn" = Unique Mutex Identifier
P2 = "value" = Session ID holding the mutex
P3 = "where" = location in code (internal identifier) where mutex is being waited for
Systemwide Waits:
At a systemwide level, there are two views which will help diagnose this wait:
GV$MUTEX_SLEEP (or V$MUTEX_SLEEPS for non-RAC)
and GV$MUTEX_SLEEP_HISTORY (or V$MUTEX_SLEEP_HISTORY for non-RAC)
These views track the instance wide usage of mutexes since instance startup. Since these views show values that are total values since startup, they are most meaningful when you obtain the difference in values during a short time interval when there was problem. The easiest way to see this information is through an AWR or statspack report in the "Mutex Sleep Summary" section.
Possible Causes (C)
I. Frequent Hard Parses.
II. High Version Counts.
III. Invalidations and reloads.
IV. Oracle Bugs. Please review WAITEVENT: "library cache: mutex X" (Document 727400.1) for the
      list of known oracle bugs.
Solutions (C)
I. Reduce hard parsing, invalidations and reloads. Please review Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch Contention (Document 62143.1) for more information.
II. Review Troubleshooting: High Version Count Issues (Document 296377.1) to   resolve high version count issue.
III. Apply fixes for the known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1 has more information on recommended patches.
Issue #4: High gc buffer busy, enq: TX -row lock contention, enq: TX - index contention, enq: TX - ITL allocate entry waits
A. enq: TX - Index Contention
B. enq: TX - ITL allocate entry waits
C. enq: TX - row lock contention
Symptoms (A)
I. AWR reports show high wait on enq: TX - index contention and enq: TX - row lock contention in mode 4.
II. AWR reports show high values for branch node splits, leaf node splits and leaf node 90-10 splits
III. AWR reports (Segments by Row Lock Waits) show high row lock waits for a particular segment
IV. AWR reports show other waits such as gc buffer busy waits on index branch or Leaf blocks, gc current block busy on Remote Undo Headers and gc current split.
Example:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX - index contention 29,870 1,238 41 9.52 Concurrency
Instance Activity Stats:
Statistic Total per Second per Trans
branch node splits 945 0.26 0.00
leaf node 90-10 splits 1,670 0.46 0.00
leaf node splits 35,603 9.85 0.05
Segments by Row Lock Waits:
Owner Tablespace Object Name Obj.Type Row Lock Waits % of Capture
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_PK INDEX 3,425 43.62
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_ST INDEX 883 11.25
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_DT INDEX 682 8.69
Background (A)
When a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction the session would wait on event enq: TX - index contention.
Possible causes (A)
I. High number of concurrent inserts leading to excessive index block splits.
II. Right-growing index with key generated from a sequence.
Solutions (A)
Solution is to reorganize the index in a way to avoid the contention or hot spots. The options are
I. Recreate the index as reverse key index
II. Hash partition the index
III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it.
Symptom (B)
AWR reports show high wait on enq: TX - allocate ITL entry and enq: TX - row lock contention in mode 4.
Background (B)
A session waits on enq: TX - allocate ITL entry when it wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle Database dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL.
Possible Causes (B)
Low initrans setting relative to the number of concurrent transactions.
Solutions (B)
Find the segments those have high ITL waits from AWR reports or using the following SQL:
SELECT SEGMENT_OWNER, SEGMENT_NAME, SEGMENT_TYPE
FROM V$SEGMENT_STATISTICS
WHERE STATISTIC_NAME = 'ITL WAITS' AND VALUE > 0
ORDER BY VALUE;
Increase initrans value of the segments, which are seeing high ITL waits.
Symptom (C)
AWR reports show high wait on enq: TX - row lock contention in Exclusive mode (6).
Background (C)
A session waits on enq: TX - row lock contention, when it waits for a row level lock that is held by another session. This happens when one user has updated or deleted a row, but has not commited or rolled back yet, and another session wants to update or delete the same row.
Solution (C)
This is an application issue and application developer needs to be engaged to look into the SQL statements involved. The following documents might be helpful in drilling down further:
Document 102925.1 - Tracing sessions: waiting on an enqueue
Document 179582.1 - How to Find TX Enqueue Contention in RAC or OPS
Document 1020008.6 - SCRIPT: FULLY DECODED LOCKING
Document 62354.1 - TX Transaction locks - Example wait scenarios
Document 224305.1 -Autonomous Transaction can cause locking

Issue #5: High CPU and memory consumptionA. High CPU Utilization
B. High Memory Utilization
Symptoms (A)
I. OS tools such as TOP, prstat, vmstat shows user CPU usage is very high, and top cpu consuming
   processes are either oracle shadow or background processes.
II. AWR reports show top waits are one or more of the following:
    latch free
    cursor pin S wait on X or cursor pin S wait or library cache mutex X
    latch: cache buffers chains
    resmgr: cpu quantum
    enq: RO - fast object reuse
    DFS lock handle
III. AWR reports (SQLs ordered by buffer gets) show SQLs with very high buffer gets per execution
      and cpu time.
IV. Process stack of the high cpu consuming process shows that the process is spinning.
Possible Causes (A)
I. Oracle bugs:
Bug 12431716 - Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU
Bug 8199533 - NUMA enabled by default can cause high CPU
Bug 9226905 - Excessive CPU usage / OERI:kkfdPaPrm from Parallel Query / High Version
count on PX_MISMATCH
Bug 7385253 - Slow Truncate / DBWR uses high CPU / CKPT blocks on RO enqueue
Bug 10326338 - High "resmgr:cpu quantum" but CPU has idle time
Bug 6455161 - Higher CPU / Higher "cache buffer chains" latch gets / Higher "consistent gets"
after truncate/Rebuild
II. Very large SGA on Linux x86-64 platform without the implementation of Hugepages.
III. Expensive SQL queries with sub-optimal execution plans.
IV. Runaway processes.
Solutions (A)
I. Apply fix for the bug(s) that you are encountering. Most effective way to get all those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
II. Implement hugepages. Please refer to Document 361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA for further explanation.
III. Tune the SQLs, which are incurring excessive buffer gets and cpu time. Please refer to Document 404991.1 - How to Tune Queries with High CPU Usage for more info.
Symptoms (B)
I. OS tools such as ps, pmap show process is growing in memory. pmap shows the growth is in heap and/or stack area of the process. For example,
#pmap -x 26677
Address        Kbytes   RSS   Anon   Locked Mode   Mapped File
00010000    496        480     -             - r-x--             bash
0009A000   80          80       24          - rwx--             bash
000AE000 160        160     40           - rwx--             [ heap ]
FF100000   688        688     -             - r-x--              libc.so.1
II. Session uga and/or pga memory of a oracle shadow process is growing.

        select se.sid,n.name,max(se.value) maxmem
           from v$sesstat se,
                    v$statname n
        where n.statistic# = se.statistic#
            and n.name in ('session pga memory','session pga memory max',
                                    'session uga memory','session uga memory max')
         group by n.name,se.sid
         order by 3;
III. Number of open cursors for a session/process is growing monotonically.
Possible Causes (B)
I. Oracle bugs:
      Bug 9919654 - High resource / memory use optimizing SQL with UNION/set functions with
      many branches
      Bug 10042937 HIGH MEMORY GROUP IN GES_CACHE_RESS AND ORA-4031 ERRORS
      Bug 7429070 BIG PGA MEM ALLOC DURING PARSE TIME - KXS-HEAP-C
      Bug 8031768 - ORA-04031 SHARED POOL "KKJ JOBQ WOR"
      Bug 10220046 - OCI memory leak using non-blocking connection for fetches
      Bug 6356566 - Memory leak / high CPU selecting from V$SQL_PLAN
      Bug 7199645 - Long parse time / high memory use from query rewrite
      Bug 10636231 - High version count for INSERT .. RETURNING statements with
      reason INST_DRTLD_MISMATCH
      Bug 9412660 - PLSQL cursor leak / ORA-600[kglLockOwnersListDelete]
      Bug 6051972 - Memory leak if connection to database is frequently opened and closed
      Bug 4690571 - Memory leak using BULK COLLECT within cursor opened using native
      dynamic sql
II. Cursor leaks caused by application not closing cursors explicitly.
III. SQLs with abnormally large hash join and/or sort operation.
Possible Solutions (B)
I. Apply fix for the applicable bug(s). The most effective way to get those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
II. Ensure cursors are explicitly closed by application.
III. Avoid very large hash join and/or sort operation.

References
NOTE:34592.1 - WAITEVENT: "log file sync" Reference Note
NOTE:857576.1 - How to Minimise Waits for 'Log File Sync'?
NOTE:873243.1 - Troubleshooting 'enq: TX - index contention' Waits in a RAC Environment.
NOTE:1298015.1 - WAITEVENT: "cursor: pin S wait on X" Reference Note
NOTE:786507.1 - How to Determine the Blocking Session for Event: 'cursor: pin S wait on X'
NOTE:1349387.1 - Troubleshooting 'cursor: pin S wait on X' waits.
NOTE:1310764.1 - WAITEVENT: "cursor: pin S" Reference Note
NOTE:727400.1 - WAITEVENT: "library cache: mutex X"
NOTE:1357946.1 - Troubleshooting 'library cache: mutex X' waits.
NOTE:1356828.1 - FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex ..' Type Wait Events
NOTE:563566.1 - gc block lost diagnostics
NOTE:164768.1 - Troubleshooting: High CPU Utilization
NOTE:756671.1 - Oracle Recommended Patches -- Oracle Database
NOTE:1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)

Resource Parameters

crsctl status resource ora.dbapoc.db -f

ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r—
Owner and access privileges granted to OS users and groups. Oracle and oinstall have all privileges, but other users can only view the resource.

ACTIVE_PLACEMENT=1
When set to 1, Oracle Clusterware uses this attribute to reevaluate the placement of a resource during addition or restart of a cluster server. For resources where PLACEMENT=favored, Oracle Clusterware may relocate running resources if the resources run on a non-favored server when a favored one joins the cluster.

AUTO_START=restore
·         always: always restart
·         restore: restore the resource to the same start before the server is stopped.
·         never: never restarts

CARDINALITY=3
The number of servers on which a resource can run, simultaneously.

CHECK_INTERVAL=1
The time interval, in seconds, between repeated executions of the check action

RESTART_ATTEMTS = 2
The number of times that Oracle Clusterware attempts to restart a resource on the resource's current server before attempting to relocate it.

FAILURE_INTERVAL=60
The interval, in seconds, before which Oracle Clusterware stops a resource if the resource has exceeded the number of failures specified by the FAILURE_THRESHOLD attribute. If the value is zero (0), then tracking of failures is disabled.

START_DEPENDENCIES=hard(ora.DATAPD101.dg,ora.DATAPF101.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.DATAPD101.dg,ora.DATAPF101.dg)

START_TIMEOUT=600
The maximum time (in seconds) in which a start action can run. Oracle Clusterware returns an error message if the action does not complete within the time specified.

STOP_TIMEOUT=600
Maximum time in seconds a stop action can run

STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATAPD101.dg,shutdown:ora.DATAPF101.dg)
dependency: The only possible value is hard.

START_DEPENDENCIES=hard(ora.DATAPD101.dg,ora.DATAPF101.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.DATAPD101.dg,ora.DATAPF101.dg)
Specifies a set of relationships that Oracle Clusterware considers when starting a resource. You can specify a space-separated list of dependencies on several resources and resource types on which a particular resource can depend. Dependency possible values are hard, weak, attraction, pullup, and dispersion. You can specify each dependency only once, except for pullup, which you can specify multiple times.

Oracle racle ASM Dynamic Volume Manager

ASM instance can have many background processes. All the required database's background processes are existed in the ASM instance like ARC, CKPT, LGWR, PMON, SMON, DBWn, etc.. Additionally, below are the list of ASM primary processes

ps -ef |egrep "rbal|mark|gmon|arb|onnn|pz|vdbg|vbg|vmb"
oracle   21913     1 0 15:16 ?        00:00:00 asm_rbal_+ASM1
oracle   21915     1 0 15:16 ?        00:00:00 asm_gmon_+ASM1
oracle   22439     1 0 15:17 ?        00:00:00 ora_rbal_OLTP1
oracle   22451     1 0 15:17 ?        00:00:00 ora_mark_OLTP1
oracle   22511     1 0 15:17 ?        00:00:00 ora_rbal_BRIO1
oracle   22525     1 0 15:17 ?        00:00:00 ora_mark_BRIO1
oracle   22729     1 0 15:17 ?        00:00:00 ora_pz99_OLTP1
oracle   22755     1 0 15:17 ?        00:00:00 asm_vbg0_+ASM1
oracle   22783     1 0 15:17 ?        00:00:00 asm_vdbg_+ASM1
oracle   22785     1 0 15:17 ?        00:00:00 asm_vmb0_+ASM1
oracle   22841     1 0 15:17 ?        00:00:00 ora_pz99_BRIO1
oracle   22977     1 0 15:18 ?        00:00:00 asm_vbg1_+ASM1
oracle   22979     1 0 15:18 ?        00:00:00 asm_vbg2_+ASM1

rbal: rebalance activities

arbn: Slave - rebalance activity

gmon: Manage disk-level activities

mark: Marks ASM aollocation units as stale when needed

pz: Parallel Slave

asm_vdbg_+ASM1: Volume driver background. It's a fatal background process that can bring down ASM instance. It forwards ASM requests to lock or unlock an extent for volume operations to the dynamic volume manager.

asm_vbg1_+ASM1: Volume background process waits for requests from the dynamic volume manager.

asm_vmb0_+ASM1: Volume membership cordinates cluster member with ASM instance

Working with OCRDUMP

Dump the contents of OCR and count the number of lines. If you dump the OCR as root, you will see more information than grid/oracle user because information in the OCR is organized by keys that are associated with privileges. See examples below:

As oracle or grid user:
ocrdump -stdout | wc -l
489

As root:
ocrdump -stdout | wc -l
2993

To dump the OCR content using XML format for the first 50 lines
              ocrdump -stdout -xml | head -50

To dump current ocr
                ocrdump -xml /apps/oracle/dump_current_ocr.xml

Dump the backup contents of a OCR in XML format, then compare it with the current OCR to detect any changes..

              ocrconfig -showbackup
si01.an.com     2010/11/17 11:40:39     /dba/backup00.ocr
si01.an.com     2010/11/17 07:40:37     /dba/backup01.ocr
si01.an.com     2010/11/17 03:40:35     /dba/backup02.ocr
si01.an.com     2010/11/16 03:40:27     /dba/day.ocr

           ocrdump -xml -backupfile /dba/day.ocr previous_day.ocr

Compare:
          diff dump_current_ocr.xml previous_day.ocr
3,4c3,5
< <TIMESTAMP>11/17/2011 13:57:05</TIMESTAMP>
< <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml /apps/oracle/dump_current_ocr.xml </COMMAND>
---
> <TIMESTAMP>11/17/2011 14:00:46</TIMESTAMP>
> <DEVICE>/dba/day.ocr</DEVICE>
> <COMMAND>/apps/oracle/product/11.2.0.2/grid/bin/ocrdump.bin -xml -backupfile /dba/day.ocr previous_day.ocr </COMMAND>
879c880
< <VALUE><![CDATA[83]]></VALUE>
---
> <VALUE><![CDATA[75]]></VALUE>

Troubleshooting 11.2 Clusterware Node Evictions (Note 1050693.1)

Starting 11.2.0.2, a node eviction may not actually reboot the machine. This is called a rebootless restart.

To identify which process initiates a reboot, you need to review below are important files

Clusterware alert log in <GRID_HOME>/log/<nodename>alertnodename
The cssdagent log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
The cssdmonitor log(s) in <GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
The ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
IPD/OS or OS Watcher data. IPD/OS is an old name for the Cluster Health Monitor. The names can be used interchaneably although Oracle now calls the tool Cluster Health Monitor
'opatch lsinventory -detail' output for the GRID home
Message files /var/log/message

Common Causes of eviction:

OCSSD Eviction: 1) Network failure or latencies issue between nodes. It takes 30 consecutive missed checkins to cause a node eviction. 2) Problem writing / reading the voting disk 3) A member kill escallation like the LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanisim. If this times out, it could escalate to a node evict.

CSSDAGENT or CSSDMONITOR Eviction: 1) OS Scheduler problem as a result of OS is locked upor execsive amounts of load on the server such as CPU utilization is as high as 100% 2) CSS process is hung 3) Oracle bug

Enable Trace / Debug

Below are several ways to enable tracing and debugging for Oracle RAC resources.

1) SRVM_TRACE: Enable Tracing for cluvfy, netca, and srvctl

export SRVM_TRACE=TRUE

      srvctl config database -d db11g1

You can dynamically enable logging (level 1 to 5)

2) Enable debug to capture clusterware resource: crsctl set log res "<resource name">:1"
     crsctl set log res "ora.registry.acfs=1"
     Set Resource ora.registry.acfs Log Level: 1

After you collect all the trace, disable the trace: crsctl set log res "<resource name >:0"

crsctl set log res "ora.registry.acfs=0"

Set Resource ora.registry.acfs Log Level: 0

3) You can enable dynamic debugging for CRS, CSS, EVM, and other clusterware sub components. The crsctl lsmodules css, crs, evm commands use to list the module's components.

crsctl lsmodules crs
List CRSD Debug Module: AGENT
List CRSD Debug Module: AGFW
List CRSD Debug Module: CLSFRAME
List CRSD Debug Module: CLSVER
List CRSD Debug Module: CLUCLS
List CRSD Debug Module: COMMCRS
List CRSD Debug Module: COMMNS
List CRSD Debug Module: CRSAPP
List CRSD Debug Module: CRSCCL
List CRSD Debug Module: CRSCEVT
List CRSD Debug Module: CRSCOMM
List CRSD Debug Module: CRSD
List CRSD Debug Module: CRSEVT
List CRSD Debug Module: CRSMAIN
List CRSD Debug Module: CRSOCR
List CRSD Debug Module: CRSPE
List CRSD Debug Module: CRSPLACE
List CRSD Debug Module: CRSRES
List CRSD Debug Module: CRSRPT
List CRSD Debug Module: CRSRTI
List CRSD Debug Module: CRSSE
List CRSD Debug Module: CRSSEC
List CRSD Debug Module: CRSTIMER
List CRSD Debug Module: CRSUI
List CRSD Debug Module: CSSCLNT
List CRSD Debug Module: OCRAPI
List CRSD Debug Module: OCRASM
List CRSD Debug Module: OCRCAC
List CRSD Debug Module: OCRCLI
List CRSD Debug Module: OCRMAS
List CRSD Debug Module: OCRMSG
List CRSD Debug Module: OCROSD
List CRSD Debug Module: OCRRAW
List CRSD Debug Module: OCRSRV
List CRSD Debug Module: OCRUTL
List CRSD Debug Module: SuiteTes
List CRSD Debug Module: UiServer

As root, crsctl set log crs "CRSEVT=1","CRSAPP=1","OCRASM=2"
Set CRSD Module: CRSAPP Log Level: 1
Set CRSD Module: CRSEVT Log Level: 1
Set CRSD Module: OCRASM Log Level: 2

Clusterware logs

Oracle clusterwares store its log files in the following locations:

Oracle clusterware alert: GRID_HOME/log/hostname/alert<nodename>.log
CRS logs (Cluster Ready Service): GRID_HOME/log/hostname/crsd/ . The crsd.log file is archived every 10MB
CSS logs (Cluster Synchonization Service): GRID_HOME/log/hostname/cssd/. The cssd.log is archived every 20MB
EVM (Event Manager): GRID_HOME/log/hostname/evmd
SRVM (srvctl) and OCR (ocrdump, ocrconfig, ocrcheck) logs: GRID_HOME/log/hostname/client and ORACLE_HOME/log/hostname/client
diagcollection.pl: $GRID_HOME/bin/
ASM: GRID_BASE/diag/asm/+asm/+ASMn

ACFS Snapshots

ACFS snapshot is a online read point-in-time copy of an Oracle ACFS file system. It can be used to recover a deteted file, revert to a prior version. When the snapshot is created, a set of metadata including the directory structures, name of all files are created on the same volume.

+ASM1 > acfsutil registry -l
Device : /dev/asm/vol_share-356 : Mount Point : /share : Options : none : Nodes : all : Disk Group : DG_POC_ACFS : Volume : VOL_SHARE

Create ACFS snapshot image
+ASM1 > /sbin/acfsutil snap create snap_share001 /share
acfsutil snap create: Snapshot operation is complete.
/share/.ACFS/snaps/snap_share001 => is the snapshot file system

View File System
+ASM1 > /sbin/acfsutil info fs /share
    ACFS Version: 11.2.0.2.0
    flags:        MountPoint,Available
    mount time:   Wed Oct 19 14:05:09 2011
    volumes:      1
    total size:   140660178944
    total free:   138043518976
    primary volume: /dev/asm/vol_share-356
        label:                 DBA Share FS
        flags:                 Primary,Available,ADVM
        on-disk version:       39.0
        allocation unit:       4096
        major, minor:          252, 182273
        size:                  140660178944
        free:                  138043518976
        ADVM diskgroup         DG_POC_ACFS
        ADVM resize increment: 268435456
        ADVM redundancy:       unprotected
        ADVM stripe columns:   4
        ADVM stripe width:     131072
    number of snapshots: 1
    snapshot space usage: 49152

Resize ACFS
+ASM1 > acfsutil size +3g /share
acfsutil size: new file system size: 140660178944 (134144MB)

View detail information on each file system
acfsutil info fs /share
acfsutil info fs /share -s (statistics)

How to configure ACFS - ASM Cluster File System and EXT3 file system

In Oracle 11gR2 Grid Infrastructure, Oracle introduces the ASM Cluster file system, or ACFS. It provides shared, cluster-wide access to various type files like database binaries, user data, reports, configuration files, BFILEs, trace files and alerts.

Below are all the steps to setup ACFS file system or EXT3 file system via ACFS.
1) Pre-checking for ACFS configuration:

+ASM1 > cluvfy stage -pre acfscfg -n racapp1,racapp2-verbose

2) Create DG_POC_ACFS diskgroup

cat /apps/oracle/work/an/dg_acfs.xml to view contain

</dg>

+ASM1 > asmcmd mkdg /apps/oracle/work/an/dg_acfs.xml

3) Create ASM volume

ASMCMD> volcreate -G DG_POC_ACFS -s 128g vol_share

ASMCMD> volinfo -a

Diskgroup Name: DG_POC_ACFS

Volume Name: VOL_SHARE

Volume Device: /dev/asm/vol_share-356

State: ENABLED

Size (MB): 131072

Resize Unit (MB): 256

Redundancy: UNPROT

Stripe Columns: 4

Stripe Width (K): 128

Usage:

Mountpath:

4) Create the ACFS file system

as root

/sbin/mkfs -t acfs -n "DBA Share FS" /dev/asm/vol_share-356

mkfs.acfs: version = 11.2.0.2.0

mkfs.acfs: on-disk version = 39.0

mkfs.acfs: volume = /dev/asm/vol_share-356

mkfs.acfs: volume size = 137438953472

mkfs.acfs: Format complete.

5) Add acfs registry:

/sbin/acfsutil registry -f -a /dev/asm/vol_share-356 /share

acfsutil registry: mount point /share successfully added to Oracle Registry

6) Mount /share file system

/bin/mount -t acfs /dev/asm/vol_share-356 /share

Or you can also create it as an EXT3 file system

volcreate -G DG_POC_ACFS -s 128g vol_dba

mkfs -t ext3 /dev/asm/vol_dba-356

mount /dev/asm/vol_dba-356 /dba

7) Validate Post ACFS configuration

cluvfy stage -post acfscfg -n racapp1,racapp2-verbose

Shared Storage for Grid Infrastructure and Software

OCR and Voting can be stored in:

Automatic Storage Manager (ASM),
Oracle Cluster File System (OCFS2),
Redhat Global File System (GFS),
Certified Network File System (NFS).

Oracle Software can be stored in all of the above storage option and ASM Cluster File System (ACFS).

V$ASM_ATTRIBUTE and COMPATIBLE attributes

If the diskgroup attribute COMPATIBLE.ASM is set to 11.1 or higher, you can see all attributes by querying v$ASM_ATTRIBUTE and v$ASM_DISKGROUP. Also you can run lsattr via ASMCMD to some the same information for the diskgroup like below..

+ASM3 - oracle: asmcmd lsattr -lm -G OV

Group_Name Name                     Value       RO Sys
OV          access_control.enabled   FALSE       N   Y
OV          access_control.umask     066         N   Y
OV          au_size                  1048576     Y   Y
OV          cell.smart_scan_capable FALSE       N   N
OV          compatible.asm           11.2.0.0.0 N   Y
OV          compatible.rdbms         10.1.0.0.0 N   Y
OV          disk_repair_time         3.6h        N   Y
OV          sector_size              512         Y   Y

Compatibility Attributes: COMPATIBLE.RDBMS must be less than or equal to COMPATIBLE.ASM. To revert to previous value, you must create a new diskgroup with the old campatiblity attributes and then restore the database files that were on the diskgroup. If setting COMPATIBLE.RDBMS and COMPATIBLE.ASM to the same value like 10.1, it will allow the greatest flexibility regarding the ASM and databas software versions that can be use a diskgroup. However, ihis setting limits access to the 11g ASM features like fast mirror resync, allocation unit, preferrred read failure groups, etc..

ASM Metadata Backup md_backup and Restore md_restore

Most of the time, backup ASM disk group is not necessary since you can recreate it and restore its content. However, if you want to keep the user templates, aliases, and directories within ASM, you need to backup the metadata for the ASM objects. The new utility in 11g is AMBR (ASM metadata backup and restore).

Backup example:

asmcmd md_backup /apps/oracle/bkdgroup -G DATA

asmcmd md_backup /apps/oracle/bkdgroupall -G 'DATA,DG_FRA_PF501'

Disk group metadata to be backed up: DATA

Disk group metadata to be backed up: DG_FRA_PF501

Current alias directory path: rac-scan

Current alias directory path: rac-scan/ASMPARAMETERFILE

Current alias directory path: rac-scan/OCRFILE

Current alias directory path: ORATOOLS/ARCHIVELOG/2011_10_04

Current alias directory path: ORATOOLS/CONTROLFILE

Current alias directory path: ORATOOLS/ARCHIVELOG

Current alias directory path: ORATOOLS

Current alias directory path: ORATOOLS/ONLINELOG

Current alias directory path: ORATOOLS/ARCHIVELOG/2011_10_06

with backup mode md_backup, information about ASM disks and disk groups, configurations, attributes, etc.. are stored in a text file bkdgroup or bkdgroupall. It will be used later to restore the ASM diskgroup metadata definition. The information gathers during ASM backup contains the ASM diskgroup name, Redundancy Type, Allocation Unit Size, diskpath, alias directories, stripe, full path of alias entries, etc..

In the restore mode md_restore, it re-create the disk group based on the backup file with all user-defined templates with the exact configuration as the backuped disk group. There are several options when restore the disk group

full - re-create the disk group with the exact configuration

nodg - Restores metadata in an existing disk group provided as an input parameter

newdg - Change the configuration like failure group, disk group name, etc..

Restore example:

asmcmd md_restore bkdgroup --nodg -g DATA
asmcmd md_restore bkdgroupall --newdg -o 'DATA:DATA_NEW'

Troubleshooting Database Network Issue

Below are couple tips to investigate network traffic and database network connections.

netstat -ptc : Display the process ID and tcp connections. If a process with abnormally high in Send-Q, it means that it consumes too much network traffic.

ping and telnet - if you can't connect the remote servers, contact your network admin.

tnsping - check the remote listener and database to make sure they're up and running.

TNS_ADMIN - verify the tnsnames.ora file

Virtualization with Xen

Advantages: 1) Effective resource usage 2) Manageability 3) Security

Concepts:

Hypervisor is the manager of the Xen enviornment. It controls and provides acess to resource like CPU, memory, storage. It's started by the bootloader and acts as traffic controller for all virtualized OS. It control migration, starting, stopping, and puasing the virtualized OS. There are 2 types of virtualizations.

Paravirtualization: It's the native mode of Xen. Performance is the main advantage incurs no more than 5 % performnace impact v.s bare hardware

Full Virtualization: Xen provides a complete machine simulation to run OS which does not include Xen support. The extra operations make the OS beliving that it runs on bare hardware add overhead. Therefore, the performance impact is greated than paravirtualization.

Xen Domains: Xen virtual hosts. The first domain loaded is called Domain-0. Domian-0 is booted immediately after the hypervisor starts and provides the user interface at the console. It's used to install and manage other domain. Any virtual machine other than Domain-0 is called Domain-U. Dom-U has no visibility of the hypervisor or the physical hardware

ethtool, mii-tool

ethtool displays or changes detailed information about ethernet adapter.

ethtool eth3
Settings for eth3:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes: 10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Link detected: yes

mii-tool is another useful tool for quickly determining the state of an ethernet adapter
root > mii-tool -v eth3
eth3: negotiated, link ok
product info: vendor 00:50:ef, model 60 rev 8
basic mode:   autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

oracleasm - Prepare the Partitions for ASM via ASMLIB

rpm -ivh oracleasm* to install 3 ASMLIB RPM packages

In the first node of the cluster:
/etc/init.d/oracleasm configure to configure the driver
/etc/init.d/oracleasm createdisk ORADATA_DISK01 /dev/emcpowera1 to mark the disk as ASM disk
/etc/init.d/oracleasm querydisk /dev/emcpowera1 to show the mapping between ASM disks and the device

From the remaining nodes:
/etc/init.d/oracleasm scandisks to detect the configured disks
/etc/sysconfig/oracleasm is modified to match and exclude the disk naming partterns

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="emcpower"
# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

Some Note about Configuring SAN Disks

1) To check HBA driver version and supported

cat /etc/modprobe.conf
alias scsi_hostadapter1 lpfc
alias scsi_hostadapter2 lpfc
modinfo lpfc | more
filename:       /lib/modules/2.6.18-194.el5/kernel/drivers/scsi/lpfc/lpfc.ko
version:        0:8.2.0.63.3p
author:         Emulex Corporation - tech.support@emulex.com
description:    Emulex LightPulse Fibre Channel SCSI driver 8.2.0.63.3p
license:        GPL
srcversion:     CABE989E4BB61146A4D4D5D

2) Check LUNs -cat /proc/scsi/scsi | more

3) Partitioning Disks: Should be run on one node only when actually partiioning the disks. Then on the other nodes, you can run partprobe to update the partition table or you can reboot the servers to update the partition table changes.

If you prepare the partitions for ASM with udev, you can define rules and in the configuration file /etc/udev/udev.conf../etc/udev/rules.d
-rw-r--r-- 1 root root   255 Jun 15 2010 89-raw_permissions.rules
-rw-r--r-- 1 root root   600 Jun 15 2010 60-raw.rules
[root@lltcind01 rules.d]# more 89-raw_permissions.rules
#OCR
KERNEL=="raw1",OWNER="root", GROUP="oinstall", MODE="640"
#Votingdisk
KERNEL=="raw2",OWNER="oracle", GROUP="oinstall", MODE="640"
KERNEL=="raw3",OWNER="oracle", GROUP="oinstall", MODE="640"
KERNEL=="raw4",OWNER="oracle", GROUP="oinstall", MODE="640"
[root@lltcind01 rules.d]# more 60-raw.rules
# Enter raw device bindings here.
#
# An example would be:
#   ACTION=="add", KERNEL=="sda", RUN+="/bin/raw /dev/raw/raw1 %N"
# to bind /dev/raw/raw1 to /dev/sda, or
#   ACTION=="add", ENV{MAJOR}=="8", ENV{MINOR}=="1", RUN+="/bin/raw /dev/raw/raw2 %M %m"
# to bind /dev/raw/raw2 to the device with major 8, minor 1.
ACTION=="add", KERNEL=="emcpoweras1", RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="emcpowerat1", RUN+="/bin/raw /dev/raw/raw2 %N"
Then you can reoload the udev rules and restart udev via "udevcontrol reload_rules, start_dev

Oracle RAC, ASM, Exadata, Cloud, and More..