Query Optimization.

In this phase of the project you will execute a number of queries on your databases, report the initial running times, and give the steps to improve the execution time of each query.

Query 1:

Give the total number of authors in DBLP that have at least one of their publication records in Microsoft Academic Search (MAS).

Query 2:

Give the distribution of author count per matched publication records in Microsoft Academic Search (MAS). This query must return a relation with two columns Number of Matched Publications and Author Count. For example, a row of the form (10, 20) in this relation means that 20 authors have 10 of their publications matched against MAS, while a row (5, 100) tells us that 100 authors have 5 of their publication records matched against MAS. Analyze the query with and without the clause "ORDER BY Number of Matched Publications."

Query 3:

Same as Query 2, but include the condition that only authors whose names contain a given substring, say 'Zhang', must be considered.

Query 4:

Same as Query 3, but include the condition that only authors whose NOTE field contain a given substring, say 'University', must be considered.

Deliverables

  • Give the SQL query for each query.
  • Include a screen shot per query that shows the execution of the query and includes the execution time.
  • Include a screen shot with the Execution Plan per query.
  • Describe the steps you took to optimize each query.
  • Include screen shots per query show the gained improvement in the execution time (show the results, the new execution time and Execution Plan).
  • This is your last report. Include the updates from all the previous phases.
  • Start early!
    Collaborate! Compare and discuss your approaches. The end product is expected to be an individual effort!