Phase 1: DBMS installation (already done), DB schema creation and data scrapping.

Requirements

  1. Install a DBMS on your own computer. The preferred choice is MySQL. This item is assumed to have been completed.
  2. Create a database for the project, call it CoursePrefs. The correctness of the design of the schema is not going to be graded.
  3. Design a Web data extraction algorithm. The algorithm needs to extract data from Class Schedule Listing pages at Temple University. There is no limitation on the techniques you decide to employ. This was exemplified in class.
  4. Extract data from 5 pages. For example, Class Schedule Listing - 2015 Fall, Class Schedule Listing - 2015 Spring, Class Schedule Listing - 2014 Fall, Class Schedule Listing - 2013 Fall, and Class Schedule Listing - 2013 Spring all for the CIS Department. These are the pieces of data that you need to extract:
    1. Course Name
    2. Course Description
    3. Course Identification Number (e.g., CIS 4331)
    4. Associated Term
    5. Registration Levels
    6. Credit Hours
    7. Instructor Name(s). Distinguish between a professor and a TA, if possible.
    8. Location
    9. Laboratory information
    10. Days
    11. Time
    12. Schedule Type
    13. Seats Available
    14. Type
    So, your data has to be organized by year and semester.
  5. Choose a programming language that you feel most comfortable with. Java/JavaScript or Visual Studio are the most frequently used. Install and IDE on your computer, e.g., NetBeans for java.
  6. Get familiar with the database middleware: e.g., JDBC for Java and ADO.NET for VS. This is important so that you can insert the extracted data into the database.
  7. FOR GRADUATE STUDENTS: Your software must manage the course preferences for instructors at multiple departments, e.g., MATH, CIS, ANTH, etc. You have to extract data from at least 3 (three) departments. In total, you will need to extract data from 15 pages (5 per department).

Deliverables

  • Create a report. The report will be updated after each phase and uploaded in blackboard. The report must be to the point and contain:
    1. a screen capture that shows the DBMS running on your computer or on our servers. Include in the figure the database CoursePrefs and its tables. Expand each table so that its fields are visible.
    2. the script you used to create the DB.
    3. include detail description of your extraction algorithms.
    4. include in your report SMALL pieces (no more than two pages!) of source code that convincingly show that you implemented the program.
    5. Do not submit any piece of your source code. You will showcase your project at the end of the semester.
    6. describe the software packages that you use to implement your algorithm, e.g., RegEx.
    7. describe the challenges in implementing your extraction algorithm.
    8. describe errors that were difficult to fix in the extraction algorithm and you might have decided to skip.
    Start early!