资 源 简 介
This code presents a modified architecture of MapReduce,
dubbed MR+, which advocates a departure from the fixed two-staged architecture of MapReduce to a flexible, multi-staged implementation. MR+ has several inherent advantages over traditional MapReduce: (1) it is resilient to skew in
intermediate results, (2) it avoids the wholesale copying of intermediate data at the end of the map phase which may otherwise paralyze the entire cluster while reduce workers are being loaded with data en masse, (3) it naturally avoids the reduce straggler problem due to a heterogeneous cluster, (4) it may be used to prioritize the processing of large datasets by detecting clusters of useful
information in the input data, and (5) it enables early estimation of results for very large datasets. Our improvements over the original architecture still maintain the clean, convenient programming model of MapReduce. Our evaluation shows that MR+ outperforms Hadoop MapReduce, Hadoop Online Prototype and LATE