Assignment 3
Due date: None. This is for you to practice.
This assignment is meant to give you practice with foundations of database systems: Query Evaluation and External Sorting.
Problem 1: External Sorting
Answer the following questions for each of these scenarios,
assuming that our most general external sorting algorithm is used:
- A file with 100,000 pages and 30 available buffer pages.
- A file with 200,000 pages and 50 available buffer pages.
- A file with 20,000,000 pages and 170 available buffer pages.
Questions:
- How many runs will you produce in the first pass?
- How many passes will it take to sort the file completely?
- What is the total I/O cost of sorting the file?
- How many buffer pages do you need to sort the file completely in just two passes?
Problem 2: Query Evaluation
Consider the join between relations R and S, where the join condition is R.a = S.b.
We are given the following information about the two relations.
The cost metric is the number of page I/Os, and the cost of writing out the result should be uniformly ignored.
- Relation R contains 10,000 tuples and has 10 tuples per page.
- Relation S contains 2,000 tuples and also has 10 tuples per page.
- Attribute b of relation S is the primary key for S.
- Both relations are stored as simple heap files.
- Neither relation has any indexes built on it.
- 52 buffer pages are available.
Questions:
- What is the cost of joining R and S using a page-oriented simple nested loops join?
- What is the cost of joining R and S using a block nested loops join?
- What is the cost of joining R and S using a sort-merge join?
- What is the cost of joining R and S using a hash join?
- How many tuples will the join of R and S produce, at most, and how many pages
would be required to store the result of the join back on disk?
- If unclustered B+ indexes existed on R.a and S.b, would
either provide a cheaper alternative for performing the join (using an index nested
loops join) than a block nested loops join? Explain.
- If clustered B+ indexes existed on R.a and S.b, would either
provide a cheaper alternative for performing the join (using the index nested loops
algorithm) than a block nested loops join? Explain.