DBMS Comparative Study.
In this phase of the project you will add a large text field to a table.
Requirements
- Implement a tool that given some data pieces of a publication (e.g., title, author name, venue) (1) retrieves the publication from a Web source, such as Google Scholar, ACM Digital Library, or IEEE Digital Library, and (2) collects the abstract of the publication. Insert the abstract into your table.
- Each of you needs to collect the abstracts of at least 10,000 publications.
- Coordinate among yourselves so that each of you collects the abstracts of different publications.
- Coordinate among yourselves so that each of you will have the union of abstracts you collectively crawl at the end of this phase.
- These Web sources may have daily quotas of allowed requests. Need to be polite.
Deliverables
- Describe your algorithm.
- Describe the policy you followed to you minimize the overlap with your colleagues.
- Describe the mechanism by which you ammassed the data from your colleagues.
- Describe the steps you took to give your data to your colleagues.
- Describe challenges you faced in this phase.
Update your report and include detail description of your Studies. Ideally you should have a summary table with all the times.
Start early!
Collaborate! Compare and discuss your approaches. The end product is expected to be an individual effort!