We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling.
- Authors
Gavagsaz, Elaheh; Rezaee, Ali; Haj Seyyed Javadi, Hamid
- Abstract
MapReduce has demonstrated itself to be as a highly efficient programming model for processing massive dataset on the distributed system. One of the most important obstacles hindering the performance of MapReduce is data skewness. The presence of data skewness leads to considerable load imbalance on the reducers and performance degradation. In this paper, the problem of how to efficiently accommodate intermediate data to even up the load of all reducers is studied when encountering skewed data. A scalable sampling algorithm is used which it can observe a more precise approximate distribution of the keys by sampling only a small fraction of the intermediate data. Afterwards, it is applied to evaluate the overall distribution of the keys. In addition, we propose a sorted-balance algorithm based on sampling results: sorted-balance algorithm using scalable simple random sampling (SBaSC). This work not only puts forward a load-balanced partitioning strategy, but also proves a significant approximation ratio of SBaSC. The experiments confirm that our solution attains a better execution time and load balancing results.
- Subjects
DISTRIBUTED computing; STATISTICAL sampling; LOAD balancing (Computer networks); SKEWNESS (Probability theory); DISTRIBUTION (Probability theory); ELECTRONIC data processing; APPROXIMATION theory
- Publication
Journal of Supercomputing, 2018, Vol 74, Issue 7, p3415
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-018-2391-9