We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Optimizing the Hadoop MapReduce Framework with high-performance storage devices.
- Authors
Moon, Sangwhan; Lee, Jaehwan; Sun, Xiling; Kee, Yang-suk
- Abstract
Solid-state drives (SSDs) are an attractive alternative to hard disk drives (HDDs) to accelerate the Hadoop MapReduce Framework. However, the SSD characteristics and today's Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. This paper explores how to optimize a Hadoop MapReduce Framework with SSDs in terms of performance, cost, and energy consumption. It identifies extensible best practices that can exploit SSD benefits within Hadoop when combined with high network bandwidth and increased parallel storage access. Our Terasort benchmark results demonstrate that Hadoop currently does not sufficiently exploit SSD throughput. Hence, using faster SSDs in Hadoop does not enhance its performance. We show that SSDs presently deliver significant efficiency when storing intermediate Hadoop data, leaving HDDs for Hadoop Distributed File System (HDFS). The proposed configuration is optimized with the JVM reuse option and frequent heartbeat interval option. Moreover, we examined the performance of a state-of-the-art non-volatile memory express interface SSD within the Hadoop MapReduce Framework. While HDFS read and write throughput increases with high-performance SSDs, achieving complete system performance improvement requires carefully balancing CPU, network, and storage resource capabilities at a system level.
- Subjects
COMPUTER storage devices; SOLID state physics; CONDENSED matter physics; COMPUTER input-output equipment; ENERGY conservation
- Publication
Journal of Supercomputing, 2015, Vol 71, Issue 9, p3525
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-015-1447-3