SQL Trace Q&A
Is the database really the problem?
This is one of the most critical phases of the problem solving process.
I've seen many DBA's jump right into the DB and start checking stats, hit ratios, etc. when someone tells them that the database is slow.
The problem is that most end users don't know the first thing about databases. Whenever something is not running as quickly as expected, they proclaim that the database is slow.
The most important thing here is to gather as much information as possible before trying to troubleshoot anything.
- What exactly is slow?
...the loading of a web page in the application?
...when a form is submitted?
...just an ad-hoc query?
...running some report or batch job?
- Define slow. How fast should it be going?
- Are all users, a group of users, or just a single user experiencing the problem?
- When did it start, and is it still occuring?
- Can you replicate the problem?
Of course in the real world you probably won't be able to get the answers to all these questions (unless you have really nice end-users ☺)
The more you can answer, the more it should help point you in the right direction instead of going on a wild goose chase by just assuming it's the database that's the problem.
How do I collect extended sql trace data?
So you've determined that the performance issue is within the database. Or even if you're still not sure, a sql trace will let you know for sure.
Now you need to collect the trace data that will allow you to pinpoint the source of the problem.
TPR can automatically start/stop tracing for a selected database session.
Simple connect to the database in question, select the session you wish to trace, and click the button. When you wish to end the trace, just click the same button to stop the tracing. Now the tracefile will be automatically copied locally (assuming you have setup the correct connectivity info) and a trace profile report will be generated.
If for some reason you want to, or need to trace a session manually, you can issue the following statements to start a trace:
SQL> alter system set timed_statistics=true;
Without timed_statistics=true you will have no wait information in the trace file, severly limiting it's usefulness.
Contrary to popular belief, timed_statistics=true does not impact performance in any significant way.
SQL> alter system set max_dump_file_size=unlimited;
This will allow unlimited trace file size to prevent it from being truncated.
Both of these statements can be set in the init.ora so they won't need to be executed every time.
To actually start tracing:
SQL> exec sys.dbms_system.set_ev(<sid>, <serial#>, 10046, <level>, '');
sid and serial# are the session you want to trace (from v$session)
level is the one of the following:
0 - Turn off tracing
1 - Standard sql_trace. Only report statement execution statistics.
4 - Level 1 stats plus bind variables
8 - Level 1 stats plus wait events
12 - All stats (statement execution + waits + binds)
You should run at least a level 8 or level 12 trace for maximum effectiveness. Do not use sql_trace=true, since it is equivalent to a level 1 trace.
After you've started the trace, have the user do whatever is slow.
If possible, let the operation run to completion before issuing the level 0 call to stop the trace.
If you aren't sure which session to trace because everyone is experiencing a problem, then just pick one. If it's slow for everyone, then any session should reveal the source of the problem.
Where is my trace file?
TPR will automatically locate and copy the tracefile locally for analysis.
If you need to locate a tracefile from a manually generated trace, then you can find it in the directory specified by the user_dump_dest parameter.
SQL> select value from v$parameter where name='user_dump_dest';
But there's hundreds of trace files there, how do I find mine?
Again, the simplest method is to allow TPR to find it for you.
Otherwise, if you're running a manual trace, you can set the tracefile_identifier parameter to something that is easily identifiable prior to running the trace.
SQL> alter session set tracefile_identifier='HARRY';
Now you just need to look in the user_dump_dest for files that are named *HARRY.trc
Note: The tracefile_identifier parameter was introduced in version 8.1.7.
If you're on an earlier version then you're just going to have to search through the files.
The filename will contain the OS process id of the session that was traced.
The following query will give you the server process id (if the session is still active):
SQL> select p.spid OS_PID
from v$session s, v$process p
where s.paddr = p.addr
and s.sid=<sid number>
and s.serial#=<serial number>;
Sorting the files by creation time will probably make it easier as well.
How do I generate a Trace Profile Report?
Creating a tracefile report is simple.
Run a trace through TPR and the report will be generated automatically.
Or select a preexisting tracefile for analysis (the icon)
Reading the report - What am I looking for?
Reading the report is pretty simple.
The statements are listed in decending order by execution time.
Usually it's just the first one or two statements that you need to concentrate on.
Clicking on the statement in the statement list will bring you to the detail info for that statement.
There you will find all the wait stats, execution stats, bind values, execution plan, and more. So you can see exactly where the statement is spending it's time.
Ok, I've found the source of the problem, now what?
There are 2 distinct parts to performance tuning.
1) Finding the source of the problem (TPR allows you to do this with a great degree of accuracy)
2) Fixing the problem (this is where experience and your working knowledge of the application in question come into play)
First, try running the query (or queries) in question in isolation.
This will help to eliminate most other factors.
By isolation, I mean run it directly on the server through SQL*Plus.
This will eliminate the application, development environment (like TOAD), the network and many other factors that could be contributing to the problem.
Run another trace when you do this and compare the reports. If the performance has improved significantly, then work your way backwards, adding another 'factor' to the equation until you determine where the problem lies.
If the statement runs just as poorly from the server then that query is obviously the problem.
Maybe it's as simple as a table needing an additional index.
Or (occuring just a often, but easily overlooked) using indexes when full table scans would be more efficient.
Or maybe the statistics are out of date and the CBO is choosing a sub-optimal plan.
Or maybe it's an Oracle bug (ahem, I mean undocumented feature)
Honestly, there are too many possibilities to list. But I think you'll find that the majority of the time, once you've found the source of the problem, the solution will be quite obvious.
I Highly recommend the books listed under Links.
A good starting point would be the Oracle SQL Tuning Pocket Reference by Mark Gurry. It's small, easy reading, and chock full of useful tips and info.