Assignment 1
Due date: Monday, October 8, 11:59PM.
NOTES:
- 10% penalty for each day late, up to three days
- You are encourage to discuss and collaborate. This is an individual assignment, nonetheless.
This assignment is meant to give you practice with Setting up a Single Node Cluster and practice MapReduce.
Requirements
- Follow the instructions about Hadoop: Setting up a Single Node Cluster available here.Install the latest stable version on your own computers.
- Problem solving with MapReduce. Create a Java implementation of the Word Count example discussed in class. Recall that you need to implement the Map and Reduce fucntions.
- Additional Suggestions:
- Hadoop uses data types from the org.apache.hadoop.io package instead of the standard Java data types (String,Int, etc.). Check Hadoop's API at:
http://hadoop.apache.org/docs/r2.4.0/api/
- Use smaller file examples for testing your implementation.
- Run your fully debugged version on a large text here
Deliverables
- Include a step by step proof of your work on installing Hadoop.
- Explain your steps toward implementing Map and Reduce.
- Show the outcome of your tool for the small file and the large file.
- Discuss encoutered challenges and steps taken to address them.
Upload the report in canvas. Do not upload the source yet.