DBA Best Practices from the Field : DBA Best Practices from the Field Arup Nanda
Starwood Hotels
Who am I : Who am I Oracle DBA for 13 years and counting
Speak at conferences, write articles, 4 books
Brought up the Global Database Group at Starwood Hotels, in White Plains, NY
Why this Session : Why this Session I have seen too many Rules of Thumb and “Best” Practices.
Some of them could be questionable or even downright wrong!
Warning: I am questioning everything, leaving nothing to “expert opinions”.
proligence.com/downloads.html
Slide 4 : “
Advice is seldom welcome.
And those who want it most
always like it the least.
”
Earl of Chesterfield
1694-1773
Guidelines : Guidelines Best practices must be justified
“Use Oracle Flexible Architecture”
“Ummm … why?”
“Because, it’s the best practice, stupid!”
No Justification ? Not Acceptable
It must aply to all cases or show clearly where it’s applicable
Best practice is to hot-backup the database
Unless, there is a performance gain by running in NOARCHIVELOG mode
Different Oracle Homes : Different Oracle Homes Traditional Approach:
/u01/app/oracle/10.2
Patches are applied to the same OH
Suggestion:
Use a different Oracle Home for each upgrade and patching
/u01/app/oracle/10.2/db1
In next patch, create a different OH
/u01/app/oracle/10.2/db2
Apply the patch here, not on db1
New Approach : New Approach
New OH Each Time : New OH Each Time Current OH: /u01/app/oracle/10.2/db4
New OH: /u01/app/oracle/10.2/db5
Reduction in risk of new code.
Patch fails -> reset the OH to the old one – db4 and restart the db
Diff is possible, what changed by the patch
Catalog scripts (c*,u*,catalog, catproc) preserved
Some binaries are available in older OH
OPatch "bugs" are also mitigated
ASM Home ? DB Home : ASM Home ? DB Home ASM is embedded in the Database code; so no need to have a different Ora Home
Suggestion:
Create a different OH for ASM, even though it will be initially identical to DB Ora Home
/u01/app/oracle/10.2/db1
/u01/app/oracle/10.2/asm1
Apply patches specific to product to OH
Set Audit Trail : Set Audit Trail Set audit_trail = db
Even if you do not need to audit
True or False: Setting audit_trail to DB will start auditing and increase IO?
Answer: FALSE! You need to issue AUDIT statements as well.
This parameter needs a recycle; so set it even if you don’t plan on using it.
11g already has it by default!
Set some parameters : Set some parameters Set some parameters (These are not modifiable by ALTER SYSTEM).
_trace_files_public = TRUE
Sooner or later you need to give access to some trace files to developers
utl_file_dir = '/tmp'
Don’t need that because of directory objects
Reqd for creating Log Miner Dictionary on Flat File
Dump "OFA" : Dump "OFA" Oracle Flexible Architecture (OFA) /u01/app/oracle
?admin/SID/bdump
?admin/SID/udump
?oradata/SID/datafiles
Does not allow separation of filesystems for security, performance, physical location, etc.
Does not allow for passive failovers
Non-OFA Layout : Non-OFA Layout Suggested Layout
/oracle
?admin/SID/* - not allowed to users
/u01/udump – allowed for users
/prddata/SID/mount1/datafiles – high performance
/prddata/SID/mount2/datafiles – low performance
This mount point naming convention –/prddata/SID/mount1– allows passive failover. MP unmounted from one host and mounted to the passive node.
On QA, use /qadata/SID/mount1/… naming convention; so both prod and qa can be mounted without risk of name collision
Analyze CPU : Analyze CPU Auditing is expensive; we need biggest bang for the buck - Session Auditing
SQL> audit session;
Purpose:
Calculate CPU consumption and profile users
Calculate I/O used by users
Identify if someone’s account was locked after repeated wrong passwords
Understand the CPU Usage : Understand the CPU Usage select username, to_char(logoff_time,'mm/dd') ts,
count(1) cnt,
sum(session_cpu) sum_cpu,
avg(session_cpu) avg_cpu,
min(session_cpu) min_cpu,
max(session_cpu) max_cpu
from dba_audit_trail
where logoff_time between '&start_date' and '&end_date'
group by username, to_char(logoff_time,'mm/dd')
order by username, to_char(logoff_time,'mm/dd')
Output
USERNAME TS CNT SUM_CPU AVG_CPU MIN_CPU MAX_CPU
--------------- ----- -------- ------------ ------------ ------------ ------------
USER1 04/04 3 918 306 17 859
USER2 04/04 36 15,286 425 0 4,094
USER3 04/04 3 794 265 174 379
USER4 04/04 187 396,299 2,119 1 124,274
Know Activity by Users : Know Activity by Users select username, to_char(logoff_time,'mm/dd') ts,
sum(logoff_lread) lread,
sum(logoff_pread) pread,
sum(logoff_lwrite) lwrite,
sum(session_cpu) scpu
from dba_audit_trail
where logoff_time between '&start_date' and '&end_date'
group by username, to_char(logoff_time,'mm/dd')
order by username, to_char(logoff_time,'mm/dd')
Output
USERNAME TS LREAD PREAD LWRITE SCPU
--------------- ----- ------------ ------------ ------------ ------------
USER1 04/04 283,271 10,858 33 918
USER2 04/04 4,570,965 6,225 2,854 15,286
USER3 04/04 601,838 1,988 26 794
USER4 04/04 33,639,028 4,545,505 1,083,473 396,299
Trace Account Locks : Trace Account Locks Identify when someone’s account was locked
select to_char(timestamp,'mm/dd/yy hh24:mi') ts,
os_username, userhost, returncode
from dba_audit_trail
where username = 'ARUP'
order by timestamp;
Output
01/10/07 14:12 arupnan CORP\UPNANT 0
01/10/07 15:12 arupnan CORP\UPNANT 0
01/11/07 04:00 orandsp hndspdb1 1017
01/12/07 04:00 orandsp hndspdb1 1017
01/13/07 04:00 orandsp hndspdb1 1017
01/14/07 04:00 orandsp hndspdb1 1017
01/15/07 04:00 orandsp hndspdb1 28000 Login OK Wrong Password Account Locked
Audit DDL : Audit DDL Because someone will always complain, what happened to his/her table
…. and you are the DBA and you are saying you don’t know what happened to it?!!!!!!!!!!!
SQL: AUDIT TABLE BY SESSION;
stmt_audit_option_map shows the statements
AUDIT ALL BY SESSION does most DDLs
Caveat: in DW environments, users create and drop a large number of tables; so this may not be advisable.
No .LOG for Redos : No .LOG for Redos Common Practice:
Redo logs are named .log
Problem:
Deletion of log files via some cron that deletes “.log” files generated, e.g. sqlnet.log.
Listener Attack that can change the listener log to redo1.log
Suggestion:
Choose .redo or .rdo for redo log files.
Listener : Listener Set Admin Restrictions in LISTENER.ORA
ADMIN_RESTRICTIONS_LISTENER=on
This prevents online modification of the listener parameters
Modify the listener.ora file and use
$ lsnrctl reload
Use a different listener for External Procedures
Build a Metadata Repository : Build a Metadata Repository Use Data Pump to Create a Repository of Objects:
$ expdp u/p content=metadata_only full=y diectory=tmp_dir dumpfile=md.dmp
Import this to create an SQL File
$ impdp u/p diectory=tmp_dir dumpfile=md.dmp sqlfile=md.sql
See my paper: Datapump: Not Just for Data Movement
Validate Database : Validate Database Use RMAN Validation Option
RMAN> backup validate database archivelog all;
Then check for corrupt blocks in view v$database_block_corruption
Logical Corruption
RMAN> backup validate check logical database archivelog all;
Preview RMAN Restore : Preview RMAN Restore Always preview a restore
RMAN> restore tablespace users preview;
Does not actually restore but checks the availability of files
Not the same as VALIDATE
Checks what files are required
Validate assumes you know that
Not the same as TEST
RMAN> restore tablespace users test;
Does not actually start the recovery process; so the tablespace need not be offline
Slide 24 : RMAN> restore tablespace users preview;
List of Datafile Copies
Key File S Completion Time Ckp SCN Ckp Time Name
------- ---- - --------------- ---------- --------------- ----
173716 238 A 30-MAR-07 62872433554 30-MAR-07 /f.rman
... And so on ...
173775 2074 A 31-MAR-07 62918498516 31-MAR-07 /j.rman
no backup of log thread 1 seq 92170 lowscn 62872343042 found to restore
... And so on ...
no backup of log thread 1 seq 92173 lowscn 62902345362 found to restore
List of Archived Log Copies
Key Thrd Seq S Low Time Name
------- ---- ------- - --------- ----
92212 1 92174 A 30-MAR-07 /PROPRD1_1_92174_525355299.arc
... And so on ...
92239 1 92201 A 01-APR-07 /PROPRD1_1_92201_525355299.arc
Media recovery start SCN is 62872433554
Recovery must be done beyond SCN 62948207913 to clear data files fuzziness
Finished restore at 06-APR-07
Save RMAN Log : Save RMAN Log You copy to tape:
RMAN backup files
Init file
Archived logs
But not RMAN Log files, do you?
RMAN Logs contain information about the backup pieces, names, location, etc.
Proves invaluable during recovery
input datafile fno=00084 name=/f1.dbf
output filename=/backup/loc3/data_D-CRMPRD_I-79785763_TS-DWT_ODS8_RES_FN
O-96_43ie2scm.rman tag=FULLBKPFS recid=174298 stamp=618757792
Allows you to look for specific files from backup sets
DBID : DBID Important for Recovery
Note the DBID and keep it in a separate place
Write DBID to alert log every time backup is taken
declare
l_dbid number;
begin
select dbid into l_dbid from v$database;
dbms_system.ksdwrt(2,'DBID='||l_dbid);
end;
Do Not Use SPFILE : Do Not Use SPFILE SPFILE Advantages:
Can be on shared filesystem, incld. on ASM
Can be backed up by RMAN
Can be updated automatically by command line by ALTER SYSTEM SET … SCOPE = SPFILE;
SPFILE Disadvantage
Older version overwritten
Comments possible; but only for the current entry
PFILE Advantages : PFILE Advantages Place comments in the init.ora file
# AKN 3/20/06 added because ...
# RJN 4/10/06 changed from 1M to 2M
# JER 10/3/06 changed from 2M to 4M
# DFW 12/7/06 changed from 4M to 6M SR# …
log_buffers = 6M
Has a history of changes, with the names and dates of changes
Very useful for troubleshooting and for record keeping
If you must use SPFILE : If you must use SPFILE Make sure you have a version control system in place to track parameter changes
Example:
SQL> create pfile=‘/tmp/a’ from spfile;
Check diff between this and the previous
Write the differences to a log file
In Oracle 11g, you can create PFILE from memory:
SQL> create pfile='…' from memory;
New Oracle User for Clients : New Oracle User for Clients Problem:
App runing on the DB server, needs SQL*Plus
$OH/bin/sqlplus is not accessible to world
Common Solution:
Change $OH permissions to allow all others
Make app part of the “dba” group
Suggestion:
Create a separate Oracle user: “appora”
Install the Oracle client under that user
Separate Instance and DB Names : Separate Instance and DB Names Common Practice:
DB_NAME same as Instance Name
Suggestion:
Append “1” after DB Name for Instance, e.g.
DB Name: PRODB
Instance: PRODB1
If you ever need to convert the DB to RAC, you will not need to change the Instance Name
No need to change Init.ora, PW File, etc.
Archivelog Location : Archivelog Location Rate the most important
Datafiles
Archivelogs
Backup of datafiles
Most important is archivelogs
If datafiles are lost, they can be recreated
Archived logs are never recreatable
Missing archived logs = halted recovery
Flash Recovery Area
Not for Archived Logs
Create a Controlfile on Trace : Create a Controlfile on Trace Execute:
SQL> alter database backup controlfile to trace as '/path/cr_db.sql‘ reuse;
It creates a CREATE CONTROLFILE script
You can use it to recreate controlfile
Or, the database itself
Self documenting the datafiles and redo logs
Change Control:
Write a separate file for each day
Do a diff to find the added files, redo logs, etc.
Use oraenv : Use oraenv Oracle supplied tool, in $OH/bin
Look up the OH in /etc/oratab or /var/opt/oracle/oratab (in Solaris)
Why this?
It makes your job easier while changing OH
It makes a consistent interface – jobs, commands
To ASSM, or not? : To ASSM, or not? Automatic Segment Space Management
Uses bitmap of free space on the block; no need to check the UET$ table
Great for Performance
But, bitmap is only for 25, 50 and 75% free
Potentially lose up to 25% space on each block
Suggestions:
Use ASSM for non-DW databases
Use MSSM for DW databases
Buffer busy waits not common on DW anyway
Kill Inactive Sessions : Kill Inactive Sessions Problem:
Some apps, especially web apps under connection pool, remain inactive draining resources.
Suggestion:
Use resource manager and set the inactive session disconnect timeout
Why RM, why not Profiles?
RM allows you to turn on and off via scheduling and event. Profiles are hard coded.
RM allows service name based control
Check Listener Log : Check Listener Log Create External Tables on Listener Logs to identify issues, profile users, etc.
See
http://www.dbazine.com/oracle/or-articles/nanda14
Service Names : Service Names Oracle database can be accessed via SID or Service Name
Conventional TNS Entry
prodb1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = prolin1)
(PORT = 1521)))
(CONNECT_DATA = (SID = PRODB1)))
Service Name
(CONNECT_DATA = (SERVICE_NAME = PRODB1)))
Enable Service Names : Enable Service Names In the instance, check service names present already:
SQL> show parameter service_names
Create additional service names:
SQL> alter system set service_names = 'SVC1', 'SVC3', 'SVC3';
Check is listener is listening for these:
$ lsnrctl services
In RAC, you should use SRVCTL:
$ srvctl add service –d MYDB –s SVC1 ...
Why Service Names? : Why Service Names? No change in functionality
Separates use from user, e.g. SCOTT logging from laptop uses service SVC1; but from app server SVC2.
Enhances resource manager use
Allows load balancing and failover in RAC or Data Guard databases
Allows fine grained failover capabilities
Service SVC1 fails from node1 to node2; but SVC2 fails to node3
OS Specific Tweaks : OS Specific Tweaks On HP/UX, use sched_noage
Necessary for setting right priorities for processes
Make "dba" group pat of MLOCK
On Solaris use Intimate Shared Memory
Optimizes the memory management
Raw Devices : Raw Devices Use one size for devices and add them to tablespaces.
Common Use:
Create a raw device of 100GB in name /dev/../users01.dbf
Create tablespace USERS with the raw device
When USERS need more room, expand the raw device.
Recommended Use:
Create raw devices of 30GB named /dev/…/d1, d2, etc.
Create tablespace with the devices d1, d2 and d3.
When USERS need more room, add a new device
Advantages
No outage
Reuse devices
Using ORADEBUG : Using ORADEBUG Problem:
Database Issue; you want to use oradebug; but SQL*Plus hangs!
When SQL*Plus does not work, use
$ sqlplus -prelim
It does not establish a connection
You can run ORADEBUG now
Dumping : Dumping Data block
alter system dump datafile d block b;
The rest:
alter session set events 'immediate trace name level 10';
Controlfile CONTROLF
File Headers FILE_HDRS
Redo Headers REDOHDR
System State SYSTEMSTATE
Process State PROCESSSTATE
Library Cache LIBRARY_CACHE
alter session set events 'immediate trace name LIBRARY_CACHE level 10';
Scripts : Scripts Deletion of trace files older than some days.
DAYS=2
find /u02/app/oracle/admin -name "*.log" -ctime ${DAYS} -exec rm {} \;
find /u02/app/oracle/admin -name "*.trc" -ctime ${DAYS} -exec rm {} \;
find /u02/app/oracle/admin -name "*.trw" -ctime ${DAYS} -exec rm {} \;
find /u02/app/oracle/admin/*/cdump -ctime ${DAYS} -exec rm -r {} \;
This clears up enough log files and trace files from OH, a major cause of failure.
Aliases : Aliases Aliases make some repetitive job faster and quicker
alias bdump='cd $ORACLE_BASE/admin/$ORACLE_SID/bdump'
alias pfile='cd $ORACLE_BASE/admin/$ORACLE_SID/pfile'
alias obase='cd $ORACLE_BASE'
alias tns='cd $ORACLE_HOME/network/admin'
alias oh='cd $ORACLE_HOME'
alias os='echo $ORACLE_SID'
Remember : Remember It’s not a best practice, if it is not justified
You have to understand why; not just what
Best practice needs to be situation-aware
Which goes back to “you have to understand”
Always question whenever someone tells you it’s a best practice
Slide 48 : Thank You!
Q&A
proligence.com/downloads.html