Semester Long Project

There are two projects. You need to choose one of the project for the semester. Choose the project that fits your programming skills better. Both projects will be carried out in phases.

Project 1: Playing with Web data.

In this project you will be given a large collection of data collected from the Web and you will be required to carry out various tasks according to the topics covered in class.

Project 2: Crawling the Deep Web.

In this project you will carry out tasks that will allow you to understand the main differences between Deep Web and Surface Web.

Project PhaseTopicDuration
Phase 1: Text Processing.Oct. 19, 2016 (two weeks)
Phase 2 Tokeninzation/Lemmatication/NormalizationNov. 2, 2016 (two weeks)
Phase 3 Term Weighting SchemesNov. 17, 2016 (two weeks)
Phase 4 Evaluation
Project PhaseTopicDuration
Phase 1: Deep Web Crawling.Oct. 19, 2016 (two weeks)
Phase 2 Extract Structured DataNov. 2, 2016 (two weeks)
Phase 3 Deep Web Crawling for a second sourceNov. 17, 2016 (two weeks)
Phase 4 TBA