Sunday, June 28, 2009

Performance Consideration of SQL and Index for DB2 on z/OS

Here is Prakash Back ... This is with interesting one which is related to Performance which is one of my favorite topic. This post is also has the link with my previous post as well related to Explain the SQL statements.
Here are some eye openers:
1) Matching Index scan: (MATCHCOLS > 0)
This comes into picture when our predicates match either leading index key columns or all of them hence give us the filtering criteria for specific index and data pages. If it is able to filter high number of records, then this type of access path is efficient.

There are 3 types of predicates:
1. Index Predicates
a. Matching predicates
b. Index screening predicates
2. Stage 1 predicates - this applied first to filter out most of the records enhancing query performance
3. Stage 2 predicates - This is applied to the rows returned from stage 1 predicates

From DB2 point of view a. first all equal predicates/ IS NULL, b. Then all Range Predicates/ IS NOT NULL c. then at last all other predicates are applied to the SQL Query.
Therefore developer should code most rstrictive predicates first.

Index Screening:
This predicates are specified on index-key columns but are not part of matching columns. In other words, they are not leading columns of the index. They used to search the index first before going to the data pages.

Nonmatching Index Scan: (ACCESSTYPE = I and MATCHCOLS = 0)
No matching columns are in index, hence all the index keys must be examined.
Sometimes it provides an efficient access path if the path is index only and index is smaller than the tablespace.

IN-list index scan: (ACCESSTYPE = N)
This is a special case of matching index scan. This is equivalent to matching equal predicate.
Exception:noncorrelated IN Subquery or in MX access or list prefetch.

Multiple-index access: (ACCESSTYPE is M, MX, MI, MU)
Multiple index access is table access by more than one index.
RID lists are constructed for each if the index and final list is retrieved after AND/OR operation. This type of access is extention to List Prefetch.

One-Fetch access: (ACCESSTYPE = I1)
This is required to retrive only 1 row. For example Max/Min column function. Most efficient access path.

Index Only Access: (INDEXONLY = Y)
If all the information in a query is available in index itself, it only access the index.

Equal Unique Index (MATCHCOLS = Number of Index Columns)
This guarantees the excat 1 row retrieval. This is next most efficient access path to One-Fetch access.

Other Considerations:
Avoiding Sort:
1) DISTINCT sort can be avoided by by using Unque index
2) ORDER BY, GROUP BY sort can be avoided by ordering index by ASC/DESC order
3) OPTIMIZE FOR n ROWS can eliminate most expensive sort with ordered index

Below information are spefically for Dynamic SQL:
Dynamic SQL performance is key to many applications.

Few things that can be considered are:
1) Reorganize DB2 Catalogue
2) Reorganize and Runstat the application tablr and index space
3) Use dynamic statement caching
4) SET CURRENT DEGREE = 'ANY' for parallelism

Query Parallelism:
This is less than 1% additional CPU overhead for long running queries and less than 10% for short running queries.

I/O and CPU Parallelism: (PARALLELISM_MODE = I or O)
The preffered method is CPU parallelism. It can be decided at both Bind time (DEGREE = ANY) and Runtime. If it is not chosen at Bind time, it can be chosen at runtime. Even if it is chosen at bind time, it may not be used at Runtime. It all depends on the current environment when the query will run.

Queries best suited for Parallelism:
1. Long running, read only queries
2. Tablespace scan
3. Joins
4. Nested loops
5. Merge scans
6. Hybrid Join
7. Sorts
8. Aggregate functions

Prallelism should not considered for:
1. Queries that materialized views
2. Queries using direct row access
3. Queries that perform materialization because of nested table expressions
4. Queries performing a merge-scan join of more than one column

Sysplex Parallelism not to be considered:
1) Queries with list prefetch and multiple index access
2) Queries accessing LOB data

Other Considerations:
1) Can not be considered if system is already CPU constrained
2) Can not be used when a CURSOR is defined WITH HOLD.

Sysplex Query Parallelism: (PARALLELISM_MODE = X)
1. Complex query to run across multiple members in a data sharing group of multi tasking env
2. Best used with ISOLATION level UR to avoid excess lock propagation

The information gives some eye sight regarding Performace.

See all you in my next blog .. Till then happy tunning...

Prakash C. Singh
IBM certified DB2 DBA.

No comments: