Assignment 1

Due date: Monday, October 8, 11:59PM.
NOTES:
This assignment is meant to give you practice with Setting up a Single Node Cluster and practice MapReduce.

Requirements

  1. Follow the instructions about Hadoop: Setting up a Single Node Cluster available here.Install the latest stable version on your own computers.
  2. Problem solving with MapReduce. Create a Java implementation of the Word Count example discussed in class. Recall that you need to implement the Map and Reduce fucntions.
  3. Additional Suggestions:
    1. Hadoop uses data types from the org.apache.hadoop.io package instead of the standard Java data types (String,Int, etc.). Check Hadoop's API at: http://hadoop.apache.org/docs/r2.4.0/api/
    2. Use smaller file examples for testing your implementation.
    3. Run your fully debugged version on a large text here

Deliverables

  1. Include a step by step proof of your work on installing Hadoop.
  2. Explain your steps toward implementing Map and Reduce.
  3. Show the outcome of your tool for the small file and the large file.
  4. Discuss encoutered challenges and steps taken to address them.
Upload the report in canvas. Do not upload the source yet.