Location

Class: Mondays 5:30 PM - 8:00 PM
Date Range: Jan. 13 - May 6, 2021
Location: virtual

Instructor

- Name: Eduard C. Dragut
- Email: edragut@temple.edu
- Office: SERC 348
- Office Hours: Mondays 10:00AM - 12:00 AM or by email appointment

Teaching Assistants

- Branimir Ljubic

Course Descriptions

The goal of the course is to get exposure to emergent and fast moving technology of Big Data and Cloud Computing. Big Data is now recognized as a critical enabling technology for business and research in all fields. We are increasingly confronted with the challenges of processing large and diverse amount of data fast. The proposed solutions are software layers that enable the processing of data in parallel and distributed environments. In this course, you will learn about basic methods in collecting, wrangling, and structuring data; programming models for computation across many nodes; standard toolkits for data analysis; and popular distributed frameworks for analytics tasks. The course material is drawn from textbooks as well as recent research literature. We will cover the following topics:
- Distributed computing models
- MapReduce
- Data Models and Cleaning
- Iterative and Stream Processing
- Data Science Applications
- Data Ethics & Privacy

Prerequisites

- Basic familiarity with the relational data model and SQL.
- Programming ability in Python and Java are also required.
- Basic familiarity with probability and statistics.
- If you do not meet any of these prerequisites, please see me asap.

Textbooks

The content of the course will mostly from the following textbook:
- Cloud Computing Solutions Architect: A Hands-on Approach, Bahga and Madisetti, 2019, ISBN: 978-0-9960255-9-1
- Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeff Ullman book website
A number of topics will be covered from
- Data Science from Scratch, by Grus, from O'Reilly. An online version can be accessed from O'Reilly's Safari service.
- Python for Data Analysis, by McKinney, from O'Reilly.
- Additional materials will be provided in the form of technical papers.

Workload

- Homework (at least 4, not more than 6)
- Quizzes (about 6, every 2 weeks)
- Midterm Exam (Date: TBD)
- Final Exam (Date: TBD). Final exam is cumulative.

Project

- TBD

Grading

The final grade will be based upon the following:
- Assignments: 50%.
- Quizzes: 20%
- Final Exam: 30%
- Extra-credit points may be given, up to 5%, based on the activity in class and the labs.

Late Submission Policy

Late submission is subject to a 10% penalty for each day late. After three days, late submissions will not be accepted. Students are strongly advised that any act of cheating will result in a score of 0 for the entire assignment and repeat offences will be reported to the Office of the Dean of Students and will result in an automatic F grade. You are encouraged to discuss problems and ideas but the final solution or code must be your own.

Classroom Requirements

  • Cell phones must be turned off or set on vibrate during class.
  • Laptop/notebook computers and tablets cannot be used during class.

Disability Disclosure

Any student who has a need for accommodation based on the impact of a disability should contact me privately to discuss the specific situation as soon as possible. Contact Disability Resources and Services at 215-204-1280 in Room 100, Ritter Annex to coordinate reasonable accommodations for students with documented disabilities.