Distributed Query Optimization
Index
- Introduction
- Objectives
- 1.Basics of Query Optimization
- 1.1.Centralized Architecture
- 1.1.1.Semantic Optimization
- 1.1.2.Syntactic Optimization
- 1.1.3.Physical Optimization
- 1.1.Centralized Architecture
- 2.Distributed Query Optimization
- 2.1.Syntactic Optimization
- 2.1.1.Data Localization
- 2.1.2.Reduction
- 2.1.3.Generating Alternative Trees
- 2.2.Physical Optimization
- 2.2.1.Generation of Execution Alternatives
- 2.2.2.Parallelism
- 2.2.3.Access Plan Evaluation
- 2.1.Syntactic Optimization
- Summary
- Self-evaluation
- Answer key
- Glossary
- Bibliography
Introduction
Objectives
-
Explain the two main extensions of distributed query processing as compared to a centralized version.
-
Explain how the data localization phase of a distributed query processing works.
-
Simulate the three main reduction rules a DDBMS will put into practice.
-
Enumerate two strategies used in the global optimization phase.
-
Explain the difference between data shipping and query shipping.
-
Justify the site selection for a join execution done by a DDBMS.
-
Choose between a semi-join or distributed join strategy.
-
Compute basic cost models for distributed query processing.
-
Enumerate the main principles behind a parallel database system.
-
Explain three kinds of parallel query processing and the main approaches to support them.
-
Discuss how cost models and query plan evaluation must be extended to support parallel query processing.
1.Basics of Query Optimization
“A query optimizer translates a query into a sequence of physical operators that can be directly carried out by the query execution engine. The output of the optimizer is called a query access plan. The goal of query optimization is to derive an efficient execution plan in terms of relevant performance measures, such as memory usage and query response time.”
Evaggelia Pitoura
Encyclopedia of Database Systems
-
Content statistics, such as number of tuples, size of attributes, etc.
-
Available storage structures: For example, partitions and materialized views.
-
Available access structures: For example, B-tree and hash indexes.
-
Applicable algorithms: Mainly for joining and sorting.
1.1.Centralized Architecture
1.1.1.Semantic Optimization
-
Integrity constraints: Defined in the schema (e.g., checks, uniqueness, etc.)
-
Logic properties: Some basic logic features can be used at this step to further exploit the schema integrity constraints (e.g., transitivity, subsumption, etc.)
1.1.2.Syntactic Optimization
-
Nodes, of which we distinguish three kinds:
-
Root: Represents the final output of the query and is thus unique.
-
Internal: Representing algebraic operations (e.g., joins, projections, unions, etc.).
-
Leaves: Representing table (or relation) accesses.
-
-
Edges: Representing data flows. The node that the arrow points to requires the data produced by the node located at the tail of the arrow.
-
90% of the tuples in B match at least one tuple from A,
-
B and A join through a FK-PK relationship (from A to B) and statistically, we know that this relationship relates each tuple in B to 10 tuples in A, on the average,
-
Only one tuple from B matches any tuple from C (and we know it exactly matches one),
-
A has millions of tuples, B has thousands of them and C only contains some hundreds.
-
(A1) Split a selection if it uses a conjunctive predicate:
-
(A1.1) Alternatively, fuse two consecutive selections into one, having the conjunction of both as predicate.
-
-
(A2) Commute the precedence of a selection and a join: , where A is an attribute of R.
-
(A3) Commute the precedence of a selection and a set operation (i.e. union, intersection and difference). For example, for the union: .
-
(A4) Commute the precedence of a selection and a projection, when the selection attribute is projected: .
-
(A5) Replicate the projection after a selection, adding the selection attribute, when this attribute is not projected: .
-
(A6) Commute the precedence of a projection and a join, when the join attributes are projected: .
-
(A7) Replicate the projection after a selection, adding the necessary join attributes, when these attributes are not projected: .
-
(A8) Commute the precedence of projection and union: . Note that intersection and difference do not commute. For example, given and , then , while .
1.1.3.Physical Optimization
-
Physical structures: Indexes, clusters, etc.
-
Access paths: How to access data by using the available physical structures: i.e., retrieve one tuple, several tuples or full scan.
-
Algorithms: Algebraic operators are no longer considered. Instead, we consider physical operations which implement the algebraic ones, plus some additional needed operations (e.g., duplicate removal, sorting or grouping).
2.Distributed Query Optimization
-
In the general case, communication costs are the main parameter to minimize (perhaps not as critical for LANs), since they dominate any other parameter (including disk accesses). For example, in access plan A, the output produced in S1 must be shipped to S4, where it will be joined with E1. If the size of S1's output is orders of magnitude bigger (for example, Gigabytes) than the size of E1 (for example, Megabytes), then the optimizer should discard this plan, because the cost of sending Gigabytes over a WAN may be unaffordable. Instead, it becomes essential to consider the alternative plan of commuting this join branch (which minimizes the communication costs, in that Megabytes instead of Gigabytes are sent over the network).
-
In addition to the already complex task of choosing the join order in a centralized environment, we have to consider specific techniques to deal with this issue in distributed environments. For example, the semi-join strategy (to be introduced in section 2.2.1.) has proved to save communication costs and nicely suits distributed environments.
-
Ideally, we can also exploit the distributed environment to benefit from parallelism as much as possible. For example, consider again access plan A. The selections in S1 and S2 can be executed in parallel and, potentially, their results could be pipelined to S3 and S4 where both joins could be executed in parallel.
-
The distributed optimizer, in order to be able to deal with fragmentation and replication, requires a global catalog with metadata and statistics about fragments and replicas. Thus, the optimizer must know which fragments and which replicas exist and where they are located. For example, if a replica of E1 is available in S1, it seems reasonable to execute that join in S1 and ignore the replica in S4, which would imply higher communication costs.
-
Semantic optimizer: This plays the same role as in a centralized architecture (see section 1.1.1.). Thus, it tries to optimize the query simply by rewriting it into an equivalent, but more efficient, SQL query. Thus, we need not give further consideration to this component.
-
Syntactic optimizer: This phase starts by representing the SQL input query in an algebraic form –typically, as a syntactic tree (see section 1.1.2.). Once the alternative syntactic trees are generated and optimized, this phase proceeds to the data location and reduction stages. The main objective of this stage is to rewrite the global query (issued in terms of the global schema) as a fragment query (data location) and identify tautological fragments that can be removed (reduction). A fragment query is posed over fragments and thus is aware of fragmentation. The fragmentation schema must be available in the global catalog in order for these stages to be executed (for further details, see section 2.1.).
-
Finally, physical optimization is split into two stages:
-
Global optimization: Performed at the control site where the query was submitted, global optimization decides at which site the operation is to be executed and inserts communication primitives into the plan. It is also responsible for exploiting parallelism regarding data distribution. For this purpose, the allocation schema must be available in the global catalog (for further details, see section 2.2.).
-
Local optimization: Identical to that of centralized databases (i.e., deciding on access methods, which indexes to use, etc.). Thus, we need not give further consideration to this component.
-
2.1.Syntactic Optimization
2.1.1.Data Localization
2.1.2.Reduction
-
Union - Selection: Since set operations and selections commute, we will push down selection through unions, reconstructing a horizontally fragmented relation. In doing so, we reduce the amount of tuples each site has to ship to others. However, from the reduction phase point of view, you should note that if the selection predicate contradicts the predicate in the fragment definition, then that fragment does not need to be considered and can be removed.
Following the example previously introduced, since:
We can push the selection through the unions and get:
Now, substituting the definition of each fragment, we get:
Finally, fusing consecutive selections, the result is:
Clearly, the result of the first selection is the empty set and the second and third ones can be simplified, resulting in:
In terms of fragments:
This last step is performed by the reduction phase. As a consequence, some fragments are dropped and we only focus on those that are not contradictory to the query statement.
-
Union - Join: Union and join commute by having the cross product of all possible joins. In this way, joins can be pushed down to the fragments, resulting in (many) more joins, with a view to exploiting parallelism (see section 2.2.2.) and, from the point of view of reduction, expecting some of them to be simplified. The reduction phase identifies useless (empty) joins by checking the query and fragment predicates to see if the join attribute defining the horizontal partition is the same in both relations.
We consider again the same fragments of R, and a new relation S with two fragments:
In this case, our query is simply the natural join of these two relations:
First, we can substitute the fragments of each relation to obtain:
We can push the join through unions by performing a cross product, and get:
Now, analyzing pairwise the definition of each fragment, we can see that out of six joins, only three of them can return tuples, because the predicates over A are contradictory for the other three, resulting in:
Again, the reduction phase will drop some fragments and reduce the amount of work to be done.
-
Projection - Join: Projection and join operations do not really commute, but we can still generate new projections under a join corresponding to the original projection. In this way, we also push projection down (through joins by reconstructing a vertical partition) to reduce the amount of attributes shipped by every site. You should note that if there are no common attributes (besides the beside key) between the query projection and the fragment definition, the fragment does not need to be accessed and thus, it can be removed in the reduction phase.
Let us consider now a fragmentation schema and the query below, over relation with primary key :
Next, we should substitute T with the join of these two partitions, resulting in:
We can push the selection through the join and get:
Now, by substituting the definition of each fragment, we get:
Finally, if we simplify consecutive projections by performing the intersection of the projected sets of attributes, the result is:
Clearly, the result of this join is identical to that of the second projection, because each fragment has all the values of the primary key, resulting in:
Which, in turn, can be represented in terms of fragments as:
2.1.3.Generating Alternative Trees
2.2.Physical Optimization
-
Producing the searching space: Unlike the local query optimizer, neither structures nor access paths are taken into consideration here. At this point, the main focus is how to efficiently execute joins by shipping less data.
-
Benefiting from parallelism: Having N distributed database nodes, one of the main features of DDBMSs is parallelism. To this end, replication plays a major role, and the allocation schema is used at this point. The great news is that relational algebra has been proven to naturally fit parallelism, so no ad hoc parallel programming skills are needed.
-
A distributed database with 5 sites (i.e., database nodes): S1, S2, S3, S4 and S5.
-
3 relations in the database R, S and T.
-
Each relation is horizontally fragmented in two fragments (we refer to them by the name of the relation and a subindex, for example: R1, R2). You can consider them to be correct (i.e., complete, disjoint and reconstructible).
-
Each fragment is replicated at all 5 sites.
-
Each selection can be individually executed at any site (i.e., 5 sites).
-
The union between fragments of the same relation can also be made at 5 different sites. Thus, for each union, we have 25 alternatives for the selections combined with 5 choices to execute the union; i.e., 125 options per union.
-
Next, we must consider combining all options for both unions – and yet, where (and how) to execute joins has not even been considered. In general, we would obtain 5n alternatives (where n is the number of operations).
2.2.1.Generation of Execution Alternatives
2.2.2.Parallelism
-
Inter-operator: several nodes of the same process tree are executed in parallel. For example, two selections over two different relations (e.g., R and S) that reside in different sites, can be executed in parallel, benefiting from distribution.
-
Intra-operator: several parts of the same node in the process tree are executed in parallel. For example, a selection on a fragmented relation (e.g., R, which is fragmented into two fragments; R1 and R2) can be executed in parallel by selecting on different fragments at each site and eventually uniting the result obtained from each of them.
Latency |
Occupancy |
||
---|---|---|---|
Serial system |
T |
T |
|
Parallel |
No stalls |
h · T/N |
T/N |
Stalls |
h · T/N + h · k |
T/N + k |
-
Random or round-robin: This strategy balances the load but blinds the searches.
-
Range: This option facilitates directed searches, but needs accurate quartile information.
-
Hash: Hash strategies facilitate directed searches, but depend dramatically on the hash function we choose. These will always have problems in the presence of non-unique attributes with many repetitions.
-
R(rid, a1, a2),
-
S(sid, rrid, b1, b2) where rrid is a FK to R(rid),
-
and the following process tree: .
2.2.3.Access Plan Evaluation
-
Local processing:
-
Average CPU time to process an instance (TCPU)
-
Number of instances processed (#inst)
-
I/O time per operation (TI/O)
-
Number of I/O operations (#I/Os)
-
-
Global processing:
-
Message time (TMsg)
-
Number of messages issued (#msgs)
-
Transfer time (to send a byte from one location to another) (TTR)
-
Number of bytes transferred (#bytes, but it could also be expressed in terms of packets)
-
Summary
Self-evaluation
-
Employee(eno, name, surname, title, salary, city),
-
Project(pno, name, head NOT NULL, budget, city) where head is an FK to Employee(eno).
WHERE e.eno = p.head;
-
Determine all the possible semi-join strategies and briefly discuss which of them is likely to produce the smallest communication cost.
-
Justify which choice is better for a DDBMS: a semi-join strategy or a regular join strategy.
Answer key
-
Sending the project head attribute (i.e., ) from site 2 to site 1, perform a semi-join there (i.e., ) and ship those Employee tuples in the result of the semi-join to site 2, where they will be joined to Project (i.e., ). It is important to note that the Employee tuples sent from site 2 to site 1 should only contain two attributes: eno (the join attribute) and name (selected in the query). There is no need to ship any other Employee attribute to site 1.
-
Sending the Employee identifier (i.e., ) from site 1 to site 2, perform a semi-join there (i.e., ) and ship those Project tuples in the result of the semi-join to site 2, where they will be joined to Employee (i.e., ). In this case, the query selects all Project attributes and thus, for those Project tuples sent to site 2, we need all their attributes.
-
In the first case, the DDBMS would ship the head attribute of each Project from site 2 to site 1, and then ship (Employee) (one per project) to site 2.
-
In the second case, the DDBMS would ship the eno attribute of each Employee from site 1 to site 2, and then ship all the Project table back to site 2.
Glossary
- Allocation (or data allocation)
- This problem appears in top-down design of distributed databases. Given a set of fragments, the data allocation problem aims to allocate them to the available sites in such a way that certain optimization criteria are met.
- ANSI/SPARC architecture
- The reference schema architecture for DBMSs
- CNF
- Conjunctive Normal Form
- DAG
- Directed Acyclic Graph
- DB
- Database
- DBA
- Database Administrator
- Data locality
- Related to distributed systems, it refers to the act of placing data where it is needed to minimize communication overhead and, consequently, be more efficient and achieve better performance
- DDB
- Distributed Database
- DBMS
- Database Management System
- DDBMS
- Distributed Database Management System
- FK
- Foreign Key
- Fragmentation
- The problem of breaking a relation into smaller pieces to be distributed over a network
- Fragmentation predicates
- A set of selections that describe a horizontal fragmentation
- LAN
- Local area network
- Latency
- Time to process a query
- Occupancy
- Time until a DBMS can accept more work
- Partitioning
- Essentially, it follows the same principle of fragmentation but, does not spread resulting fragments over a network, keeping them instead in local. Partitioning can be used for many purposes, but it is mainly used to benefit from parallelism and to implement privacy
- PK
- Primary Key
- Replication
- When the same fragment is allocated in several sites. It is mainly used to improve reliability and efficiency of read-only queries
- Scalability
- In distributed systems, it refers to the system expansion
- Speed-up
- This is a measure of scalability where we consider additional hardware for a constant problem size
- Scale-up
- This is a measure of scalability where we consider that the problem size is altered with the resources
- SQL
- Structured Query Language
- Transparency
- This refers to separation of the higher-level semantics of a system from lower-level implementation issues
- WAN
- Wide area network