Database Land

February 14, 2008

The Basics (VI) - Overview on Query Processing

Filed under: catalog, cost, execution, optimization, parsing, plan, processing, query, rewrite, semantic, sql — Amr El-Helw @ 2:20 pm

I mentioned before that SQL is a declarative language, where the user only specifies WHAT they want to get from the database, not HOW to get it. The how part is the responsibility of the database engine. Query processing is a complex task that involves several steps in order to translate a user’s query to something that the computer can actually execute in order to retrieve the results of that query.

Query processing usually consists of the following phases:

  1. Parsing
    In this phase, the engine analyzes the text of the SQL query, to make sure that it is written correctly, and reports any syntax errors (misspellings, unidentified operators, etc.). If the query has no errors, it is translated into an internal representation that is to be used for later processing.
  2. Semantic Checking
    Now that the text of the query has no errors, the semantics are checked. The engine makes sure that any referenced tables, columns, or views are valid. This phase is sometimes considered part of the parsing phase.
  3. Query Rewrite
    First of all, note that usually the same query may have multiple representations in SQL. Complex queries often result in redundancy, especially with views. The Query rewriting phase rewrites a given SQL query into another query that is equivalent (produces the same result), but may be processed more efficiently.
  4. Query Optimization
    This is probably the most important phase in query processing, in which the engine tries to figure out the best way to evaluate the query (the “how” part). The same query can usually be evaluated in multiple ways, depending on which tables are joined first, what join methods are used, whether the predicates are tested at the beginning or at the end, etc. Each such way is called a query execution plan (or just a query plan). For the same query, some plans are more efficient than others. The job of the query optimizer is:

    1. Determine all possible query plans for the query in hand
    2. Estimate the execution cost for each of these plans. This is achieved using the statistics available in the database catalog
    3. Pick the plan with the least estimated cost
  5. Code Generation
    Once a query plan is chosen, the system translates this plan into executable machine code.
  6. Plan Execution
    Finally the query is executed, and the results are returned.

January 22, 2008

The Basics (IV) - Database Catalog

Filed under: basics, catalog, statistics — Amr El-Helw @ 10:45 am

The database catalog of a database is a repository of information (or meta data) about the various objects in the database, such as tables, views, indexes, users, etc. The information in the catalog is used by the database system when processing any of these objects.

For example, the information about a database table would include the definition of the table (i.e. the names of the columns, the data types, the primary key, and any foreign keys in that table), as well as statistics on the data stored in that table. These statistics include the number of tuples in that table, the number of physical data pages that the table occupies on disk, the user who owns the table. It also includes statistics about each column in that table, for example: the highest and lowest values in the column, the number of distinct values in the column, and possibly the distribution of data values in the column (usually in the form of a histogram). In addition to statistics about single columns, sometimes statistics are also maintained about groups of columns. For example, similar to keeping the number of distinct values, the catalog might contain the number of distinct value combinations in a group of columns. This can be useful in query processing.

Note that the database catalog is itself a set of tables. The information in the catalog can usually be accessed in the same way as the information in any other table, but usually only for viewing, not for update. Updating the catalog manually can cause serious inconsistencies in the database which can cause the database system to be unusable. For example, imagine if a user can manually update the catalog and change the definition of a table. In this case, the table definition would not match the data already stored in it, thus the data will be lost. For this reason, manual updates of the catalog are not permitted, and must only be done through specific commands.

Blog at WordPress.com.