[prev] 37 [next]

MapReduce

MapReduce is a programming model
  • suited for use on large networks of computers
  • processing large amounts of data with high parallelism
  • originally developed by Google; Hadoop is open-source implementation
Computation is structured in two phases:
  • Map phase:
    • master node partitions work into sub-problems
    • distributes them to worker nodes (who may further distribute)
  • Reduce phase:
    • master collects results of sub-problems from workers
    • combines results to produce final answer