Assignment 1
Due date: Monday, October 8, 11:59PM.
NOTES:
- 10% penalty for each day late, up to three days
- You are encourage to discuss and collaborate. This is an individual assignment, nonetheless.
This assignment is meant to give you practice with Setting up a Single Node Cluster and practice MapReduce.
Requirements
Follow the instructions about Hadoop: Setting up a Single Node Cluster available here.Install the latest stable version on your own computers.
Problem solving with MapReduce. Create a Java implementation of the Word Count example discussed in class. Recall that you need to implement the Map and Reduce fucntions.
Additional Suggestions:
Hadoop uses data types from the org.apache.hadoop.io package instead of the standard Java data types (String,Int, etc.). Check Hadoop's API at:
http://hadoop.apache.org/docs/r2.4.0/api/
Use smaller file examples for testing your implementation.
Run your fully debugged version on a large text here
Deliverables
- Include a step by step proof of your work on installing Hadoop.
- Explain your steps toward implementing Map and Reduce.
Show the outcome of your tool for the small file and the large file.
Discuss encoutered challenges and steps taken to address them.
Upload the report in canvas. Do not upload the source yet.