Query Optimizing

What
It is the query optimizer's job to pick a reasonable query plan, not necessarily the most optimal one. It's main responsibility is to prevent picking bad query plans.

How

 * 1) Query_optimizer_convert_query.pngrt the query into relational algebra operators
 * 2) Arrange the operators in a tree
 * 3) The operators have different choices for how to implement them
 * 4) Operators can be arranged in different orders to get different results

Example: Basic Plan


This example will demonstrate a query plan that has not yet been optimized. It will be the starting point for this motivating example for query optimization. Notice how there is an implementation choice to the right of the relational operator.

Pushing Selects


We can move SELECT operators before joins to reduce the amount of redundant loop scanning that is done. In this way, we reduce the amount of I/Os that need to be done. Notice that with the choice of page-oriented nested loops, we must do $$M + (M*N)$$ I/Os. By reducing $$M$$ or $$N$$, we can reduce the amount of I/Os by impressive fractions. This also highlights the importance of implementation choices in these relational operators.

When Pushing Selects Does Not Help
Sometimes, pushing selects will not in fact give us any performance boost. This happens, notably, when we push a select on top of the inner relation of a join. This happens when we are using a join algorithm that scans the inner relation repeatedly as in this case. Note that this query plan yields the same performance as the example above

Switching the Inner and Outer Relations
We see that we can gain a big speed boost by changing which relation we use for the inner and outer relation depending on the selection condition beforehand. Understanding that the page-oriented nested loop has the cost $$M + (M*N)$$. We should choose inner and outer relations to lower this cost.

Precomputing Selects
We can get speed gains in this loop join case if we apply the select operator on the inner relation in a precompute stage. This is because we're no longer looping through the entire inner relation. We are looping over a smaller relation, which reduces the amount of I/Os we must perform.