DBMS Comparative Study.

In this phase of the project you will add a large text field to a table.

Requirements

  1. Implement a tool that given some data pieces of a publication (e.g., title, author name, venue) (1) retrieves the publication from a Web source, such as Google Scholar, ACM Digital Library, or IEEE Digital Library, and (2) collects the abstract of the publication. Insert the abstract into your table.
  2. Each of you needs to collect the abstracts of at least 10,000 publications.
  3. Coordinate among yourselves so that each of you collects the abstracts of different publications.
  4. Coordinate among yourselves so that each of you will have the union of abstracts you collectively crawl at the end of this phase.
  5. These Web sources may have daily quotas of allowed requests. Need to be polite.
  6. Deliverables

    1. Describe your algorithm.
    2. Describe the policy you followed to you minimize the overlap with your colleagues.
    3. Describe the mechanism by which you ammassed the data from your colleagues.
    4. Describe the steps you took to give your data to your colleagues.
    5. Describe challenges you faced in this phase.
    6. Update your report and include detail description of your Studies. Ideally you should have a summary table with all the times.
    Start early!
    Collaborate! Compare and discuss your approaches. The end product is expected to be an individual effort!