JSON and SQL.
JSON is a popular textual data format that's used for exchanging data in modern web and mobile applications. JSON is also used for storing unstructured data in log files or NoSQL databases. The goal of this project is to introduce you to JSON and SQL Databases.
Requirements
- Phase 1. Get familiar with JSON. In this task, you will need to read about JSON and get familiar with it. Learn about Export relational data to JSON. Learn about Import JSON documents into a document databases. Different DMBSes have different statements to support them. You will need to read about them and implement on some toy examples.
- Document database. Choose a database system that manages JSON documents. You can choose from a number of systems: MySQL, MongoDB, or Oracle. More choices are available here. You will receiveive a large number of JSON documents. In this task you will create a document database based on this data.
- Phase 2: JSON to Relationa. In this task, you will parse the data received in the previous phase and create a relational database dbjson corresponding to it. You can use a programming environment of your choice, e.g., Java or Python. You will need to study the data first and propose a relational schema for it. For instance, one possible schema is a giant table that contains all the data. However, this is not accepted as a deliverable of this phase. Your schema needs to have clearly defined tables that minimize redundancy.
- Phase 3. Comparative Studies. You will receive in canvas a number of queries and tasks to complete over the 2 databases. You will report your findings in the report.
You will need to run the provided queries against the two databases. Collect statistics, like running time and memory.
- A set of text files with JSON objects will be provided.
A set of queries will be provided along.
Take steps to optimize your databases, e.g., introduce indices. You may want to read about indices on JSON attributes.
Deliverables
A semester report. However, you will need to upload a weekly version of report on each Monday until the end of the semester. Each subsequent version of the report marks clearly the new parts, e.g., using different font styles or colors. The report starts with a progress section, which gives an itimized list of the updates to your project and report.
Include in your report SMALL pieces (no more than two pages!) of source code that convincingly show that you implemented the tasks.
Source code - only after you give an in-person demo.
Describe the software packages that you use to implement your algorithms.
Describe the difficulties in implementing your algorithms and how you overcame them.
Provide statistics about the queries.
Include a screen shots that you ran your queries on your computer.
And, the most important piece: comparative study over the 3 types of database schemas. Give your system configuration.
Give details about your steps to improve the queries.
Start early!
Collaborate! Compare and discuss your approaches. The end product is expected to be an individual effort!