Database Land

February 14, 2008

The Basics (VI) - Overview on Query Processing

Filed under: catalog, cost, execution, optimization, parsing, plan, processing, query, rewrite, semantic, sql — Amr El-Helw @ 2:20 pm

I mentioned before that SQL is a declarative language, where the user only specifies WHAT they want to get from the database, not HOW to get it. The how part is the responsibility on the database engine. Query processing is a complex task that involves several steps in order to translate a user’s query to something that the computer can actually execute in order to retrieve the results of that query.

Query processing usually consists of the following phases:

  1. Parsing
    In this phase, the engine analyzes the text of SQL query, to make sure that it is written correctly, and reports any syntax errors (misspellings, unidentified operators, etc.). If the query has no errors, it is translated into an internal representation that is to be used for later processing.
  2. Semantic Checking
    Now that the text of the query has no errors, the semantics are checked. The engine makes sure that any referenced tables, columns, or views are valid. This phase is sometimes considered part of the parsing phase.
  3. Query Rewrite
    First of all, note that usually the same query may have multiple representations in SQL. Complex queries often result in redundancy, especially with views. The Query rewriting phase rewrites a given SQL query into another query that is equivalent (produces the same result), but may be processed more efficiently.
  4. Query Optimization
    This is probably the most important phase in query processing, in which the engine tries to figure out the best way to evaluate the query (the “how” part). The same query can usually be evaluated in multiple ways, depending on which tables are joined first, what join methods are used, whether the predicates are tested at the beginning or at the end, etc. Each such way is called a query execution plan (or just a query plan). For the same query, some plans are more efficient than others. The job of the query optimizer is:

    1. Determine all possible query plans for the query in hand
    2. Estimate the execution cost for each of these plans. This is achieved using the statistics available in the database catalog
    3. Pick the plan with the least estimated cost
  5. Code Generation
    Once a query plan is chosen, the system translates this plan into executable machine code.
  6. Plan Execution
    Finally the query is executed, and the results are returned.

February 3, 2008

The Basics (V) - Database Queries & SQL

Filed under: basics, query, sql — Amr El-Helw @ 5:59 pm

So far, we have talked about the data being arranged nicely into tables, with relationships between these tables that represent the relationships between the real life objects they represent. We even have information about these tables stored in the database catalog. So, now what??

The real advantage of a database system is not its ability to store the data in an organized format. If we have huge amounts of data that we only need to store, then we can just dump it into huge text files. Actually, we might as well just leave all the data in huge paper files tucked into an old dusty archive room. After all, the data will just be sitting there. But let’s say, for some reason, I want to check how many projects have been completed by the XYZ department in the year 2005, with only 5 employees or less working on each project. I think it will be hard to go through all those files to figure out the answer to that.

That’s where databases make our life easier. The real advantage of using a database is not just storing the data, but also being able to query the data…efficiently. Think of a database query as a question. The user asks the database a question, and the database returns the result. Database queries can be as simple or as complex as required.

In order to standardize the way databases are queries, SQL (Structured Query Language) was invented. SQL is a standard way of formulating a query that can be understood be any database system (except from some system-specific features that are not part of the standard SQL). SQL is considered a declarative language, not a procedural language. This means that the user only specifies WHAT they want, not HOW to get it. For example, the following simple SQL query asks the database to retrieve the names of all employees who work in the Sales department since before 2001:

SELECT name
FROM employee
WHERE dept_name = ‘Sales’
AND hire_year < 2001

SQL can also be used to insert or update the data in the database. We are not going to get into the details of SQL here, but a good introduction to SQL can be found here.

Next Page »

Blog at WordPress.com.