Assignment 3

Due date: None. This is for you to practice.
This assignment is meant to give you practice with foundations of database systems: Query Evaluation and External Sorting.

Problem 1: External Sorting

Answer the following questions for each of these scenarios, assuming that our most general external sorting algorithm is used:
  1. A file with 100,000 pages and 30 available buffer pages.
  2. A file with 200,000 pages and 50 available buffer pages.
  3. A file with 20,000,000 pages and 170 available buffer pages.
Questions:
  1. How many runs will you produce in the first pass?
  2. How many passes will it take to sort the file completely?
  3. What is the total I/O cost of sorting the file?
  4. How many buffer pages do you need to sort the file completely in just two passes?

Problem 2: Query Evaluation

Consider the join between relations R and S, where the join condition is R.a = S.b. We are given the following information about the two relations. The cost metric is the number of page I/Os, and the cost of writing out the result should be uniformly ignored. Questions:
  1. What is the cost of joining R and S using a page-oriented simple nested loops join?
  2. What is the cost of joining R and S using a block nested loops join?
  3. What is the cost of joining R and S using a sort-merge join?
  4. What is the cost of joining R and S using a hash join?
  5. How many tuples will the join of R and S produce, at most, and how many pages would be required to store the result of the join back on disk?
  6. If unclustered B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using an index nested loops join) than a block nested loops join? Explain.
  7. If clustered B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using the index nested loops algorithm) than a block nested loops join? Explain.