Current MS Students/Cheralyn Cofer

From CSWiki

Jump to: navigation, search

SQL Tuning Calculator

An implementation of Dan Tow's manual sql tuning methodology

Contents

[edit] What I am planning to accomplish

[edit] Brief Project specification

The SQL Tuning Calculator project will assist developers in determining the most robust table join order for their SQL query. This tool's algorithm will be based on the methodology of sql tuning specialist, Dan Tow, as described in his book, SQL Tuning, and in patent USPatent #5761654. The key benefit of this tuning calculator is that the developer will provide insight about the selectivities of their query conditions to the tool that are unknown to RDBMS optimizers.

[edit] High level architecture

The tool will be one of the following types:

1. A standalone application that can be configured to link to your database.

2. A web tool that will not be configured to link to your database, but will instead require initial configuration by the user to populate the tool with core table metrics such as table and block sizes.

Components:

  • SQL query parser to separate the user's input query into its components (tables, joins, select clauses, condition clauses, aggregate clauses).
  • Appropriate interface to gather further information from the user.
  • Local database that stores user input and query statistics for each user instance.
  • Java code driving both the user prompts, and the building and traversing of the query diagram.

[edit] Example scenario to demonstrate the importance of table join order, which the tool will address

Example cost analysis to illustrate benefit of determining robust table join order:

  • Initial query:
SELECT 
d.deptname,
e.lastname,
e.firstname
FROM
employees e
JOIN departments d on (e.dept_id = d.dept_id)
WHERE e.exempt_flag = 'Y' 
AND d.us_based_flag = 'Y'
  • Query Diagram, using Tow methodology to determine filter selectivities (in quotes) and table join ratios (not in quotes):


E '0.1'
|20
|
|
|
|0.98
V
D '0.5'
  • Using rows pulled as the proxy for query cost (SQL Tuning, 116):

Let C = rowcount of E

METHOD 1: E ==> D

1. Let C = rowcount of E

2. Pulls .1 x C filtered rows in driving table E.

3. Joins to table D to reach .1 x C x .98 rows of D.

4. Adding the two I/Os from steps 2 and 3, the total count of table rows pulled is (.1 x C) + (.1 x C x .98) ==> 0.198C


METHOD 2: D ==> E

1. Find rowcount of D in terms of C: C x .98 x 1/20

2. Pulls .5 x (C x .98 x 1/20) filtered rows in driving table D.

3. Joins to table E to reach .5 x (C x .98 x 1/20) x 20 rows of E.

4. Adding the two I/Os from steps 2 and 3, the total count of table rows pulled is (.5 x (C x .98 x 1/20)) + (.5 x (C x .98 x 1/20) x 20) ==> 0.5145C

  • With 0.198C relative cost, Method 1 will be about 2.6 times faster than Method 2's 0.5145C relative cost.
  • Final tuned query:
SELECT  /*+ ordered use_nl(e,d)*/
d.deptname,
e.lastname,
e.firstname
FROM
employees e
JOIN departments d on (e.dept_id = d.dept_id)
WHERE e.exempt_flag = 'Y' 
AND d.us_based_flag = 'Y'

[edit] Why this is academically interesting

[edit] Intended audience

Database users needing assistance tuning their sql code for robust performance.

[edit] Why this is MS-level work

I've observed that SQL developers often derive execution plans through an iterative process of trial and error. Dan Tow's algorithm, on the other hand, illustrates a mathematically based diagramming method to acheive the same end in a more timely manner. The ambition of this project is twofold -- it requires a solid understanding of the algorithm as well as an implementation that is both true to the algorithm and user friendly.

[edit] Previous work

[edit] Platforms to be used and what I will add to each

  • General SQL Parser : Used to parse SQL statements into parse trees in your program. I may have to add some parsing logic for Oracle 10g sql functions.
  • Java, and possibly JSP if I choose a web platform.
  • MySQL database for storing user feedback and table statistics.

[edit] Available platforms I am not using and why

  • To my knowledge, there are no existing tuning calculators that are driven by Tow's methodology.

[edit] Anticipated challenge(s)

[edit] Challenges and Anticipated Approach to Each Challenge

Conceptually, I need to become an expert in Dan Tow's manual SQL tuning methodology. I have read several favorable reviews of his book by Oracle experts, each describing how they had to read the book several times to fully digest the material, as will I have to do.

The implementation of this tool will be equally challenging for me, involving the following steps:

1) Parse the incoming sql and store the components in an appropriate data structure. I plan to use a robust sql parser for this task to minimize efforts here.

2) Determine the filter and join ratios of the query via three possible methods: a) prompt the user for condition selectivity statistics, b) pull up past user statistics feedback, and c) possibly query the database itself (would require the tool to have initial configuration so it can connect to your database). The program will need to improve its method of information-gathering the more its used, so that increased usage will result in a "smarter" program. For example, the first couple times it's used, all filtering conditions will be new to the program, but with increased usage, the program will recognize certain combinations of filtering conditions and will be able to make selectivity suggestions to the user.

3) Store the directed graph diagram in an appropriate data structure.

4) Traverse the directed graph using Tow's strategy (i.e. drive from the table with the lowest selectivity to reduce your initial data set as much as possible), and output the optimal join order and query hints.

[edit] What I bring to this work

[edit] My relevant background and experience

I've worked primarily with operational databases, data feed ETL (data extraction, transformation, and loading), and more recently, in a data warehouse environment. It is my latter experience working with the world's largest Oracle instance as well as a newly developed warehouse datamart that was the impetus for my SQL tuning project idea. I plan to build my tool as a Java standalone or JSP web tool, and accomodate standard and Oracle proprietary SQL code input using one of the SQL parsers that I've researched. In CS420 I developed an extensize web-interface tool whose success depended on careful database configuration -- an experience which will help me in formulating this production-level tool.

[edit] What I find interesting about this work

Because this project is intended for production-level calculations, I look forward to completing the tool so I can use it to develop more efficient reports at my job.

[edit] How this work goes beyond my experience and course work

My project encompasses theory, code, and web architecture, providing a sophisticated culmination to several areas of coursework.

Personal tools