HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark
Big Data analytics using Map-Reduce over Hadoop has become a leading edge paradigm for distributed programming over large server clusters. The Hadoop platform is used extensively for interactive and batch analytics in ecommerce, telecom, media, retail, social networking, and being actively evaluated for use in other areas. However, to date no industry standard or customer representative benchmarks exist to measure and evaluate the true performance of a Hadoop cluster. Current Hadoop micro-benchmarks such as HiBench-2, GridMix-3, Terasort, etc. are narrow functional slices of applications that customers run to evaluate their Hadoop clusters. However, these benchmarks fail to capture the real usages and performance in a datacenter environment. Given that typical datacenter deployments of Hadoop process a wide variety of analytic interactive and query jobs in addition to batch transform jobs under strict Service Level Agreement (SLA) requirements, performance benchmarks used to evaluate clusters must capture the effects of concurrently running such diverse job types in production environments. In this paper, we present the methodology and the development of a customer datacenter usage representative Hadoop benchmark "HcBench" which includes a mix of large number of customer representative interactive, query, machine learning, and transform jobs, a variety of data sizes, and includes compute, storage 110, and network intensive jobs, with inter-job arrival times as in a typical datacenter environment. We present the details of this benchmark and discuss application level, server and cluster level performance characterization collected on an Intel Sandy Bridge Xeon Processor Hadoop cluster.
2013 IEEE International Symposium on Workload Characterization (IISWC)
Saletore, V. A.; Krishnan, K.; Viswanathan, V.; and Tolentino, M. E., "HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark" (2013). School of Engineering and Technology Publications. 32.