Project

Scraping data from lists/tables from the Web.

In this project you will develop software that will collect data from Web tables/lists. We focus on academic data, particularly, on papers published in CS conferences. Each major CS conference has a website where the information about the conference are publicized. We are interested in two pieces of information: (1) the list of accepted publications and (2) the list of PC members. Here is an example: VLDB is one of the most prestigious DB conferences. This year's VLDB website is here. The PC list is here. The list of accepted papers is here. Your task is to collect that data and store it in your local database for which you will create a schema. Then you will create a simple user interface that will allow the user to search the data. For example, the interface will allow the user to find all PC members from an institution, say MIT, at some conference, say ICDE 2011.

ProjectTopicDuration
Phase 1: DBMS installation, DB schema creation and working dataset.Sept. 17, 2014 (one week)
Phase 2 Web Data ScrapingOct. 8, 2014 (three weeks)
Phase 3 Revise Schema. Read my comments in blackboard.Oct. 15, 2014 (one week)
Phase 4 Insert Scrapped Data into your Database.
Phase 5 Web interface.Nov. 10, 2014 (two weeks)
Phase 6 Fun with the data: give interesting stats from data.Dec. 3, 2014 (three weeks)
Phase 7 Upload Final Report and Source Code with Documentation.Dec. 3, 2014