We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Improvement of job completion time in data-intensive cloud computing applications.
- Authors
Ibrahim, Ibrahim Adel; Bassiouni, Mostafa
- Abstract
Task stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing in cloud data centers. This impedance is due to the uneven distribution of input data, heterogeneous data nodes, resource contention situations, and network configurations. Data skew of intermediate data in MapReduce job causes delay failures due to the violation of job completion time. Data-intensive computing frameworks, such as MapReduce or Hadoop YARN, employ HashPartitioner. This partitioner may cause intermediate data skew, which results in straggler reducers. In this paper, we strive to make Hadoop YARN more efficient in cloud environments. We present, a new partitioning scheme, called balanced data clusters partitioner (BDCP), to handle straggler Reduce tasks based on sampling of input data and feedback information about the current processing task. Our extensive experimental results show that BDCP can outperform the default Hadoop HashPartitioner and Range partitioner. BDCP can assist in straggler mitigation during reduce phase and minimize the job completion time in MapReduce jobs within data-intensive cloud computing.
- Subjects
CLOUD computing; SERVER farms (Computer network management); DATA distribution; CLOUDS &; the environment; DISTRIBUTED computing
- Publication
Journal of Cloud Computing (2192-113X), 2020, Vol 9, Issue 1, p1
- ISSN
2192-113X
- Publication type
Article
- DOI
10.1186/s13677-019-0139-6