Introduction to Document Databases

JSON is a popular textual data format that's used for exchanging data in modern web and mobile applications. JSON is also used for storing unstructured data in log files or NoSQL databases. The goal of this project is to introduce you to JSON and Data integration.

Requirements

  1. Phase 1. Get familiar with JSON. In this task, you will need to read about JSON and get familiar with it. Learn about DBMS that natively manage JSON docs and compare that with the approach traditional relational DMBS vendors take to accomodate JSON into their systems.
    • Document database. We will use MongoDB. Here is a tutorial you should consult. Choose a relational DMBS that manages JSON documents. You can choose from a number of systems: MySQL, SLQ Server, or Oracle. You will collect and receive a large number of JSON documents. In this task you will create a document database based on this data.
  2. Phase 2. Comparative Study of Query Languages. In this task, you will study the query language of MongoDB and that of one of a relational DBMS relative to JSON documents. The goal is to learn the difference in handling JSON. For example, you can find notes on JSON in MySQL here, in SQL SERVER here, and in Oracle here. I will provide data canvans to load into your database instances. You will use that data to run empirical studies. It is important that you use this phase of the project to read and practice the query languages.

Deliverables

  • A semester report. However, you will need to upload a weekly version of report on each Monday until the end of the semester. Each subsequent version of the report marks clearly the new parts, e.g., using different font styles or colors. The report starts with a progress section, which gives an itimized list of the updates to your project and report.
  • Include in your report SMALL pieces (no more than two pages!) of source code that convincingly show that you implemented the tasks.
  • Source code - only after you give an in-person demo.
  • Describe the software packages that you use to implement your algorithms.
  • Describe the difficulties in implementing your algorithms and how you overcame them.
  • Provide statistics about the queries.
  • Include a screen shots that you ran your queries on your computer.
  • Give details about your steps to improve the queries. Start early!
    Collaborate! Compare and discuss your approaches. The end product is expected to be an individual effort!