"HcBench: Methodology, Development, and Characterization of a Customer " by V. A. Saletore, K. Krishnan et al.

School of Engineering and Technology Publications

Title

HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark

Authors

V. A. Saletore
K. Krishnan
V. Viswanathan
M. E. Tolentino, University of Washington TacomaFollow

Publication Date

9-1-2013

Document Type

Conference Proceeding

Abstract

Big Data analytics using Map-Reduce over Hadoop has become a leading edge paradigm for distributed programming over large server clusters. The Hadoop platform is used extensively for interactive and batch analytics in ecommerce, telecom, media, retail, social networking, and being actively evaluated for use in other areas. However, to date no industry standard or customer representative benchmarks exist to measure and evaluate the true performance of a Hadoop cluster. Current Hadoop micro-benchmarks such as HiBench-2, GridMix-3, Terasort, etc. are narrow functional slices of applications that customers run to evaluate their Hadoop clusters. However, these benchmarks fail to capture the real usages and performance in a datacenter environment. Given that typical datacenter deployments of Hadoop process a wide variety of analytic interactive and query jobs in addition to batch transform jobs under strict Service Level Agreement (SLA) requirements, performance benchmarks used to evaluate clusters must capture the effects of concurrently running such diverse job types in production environments. In this paper, we present the methodology and the development of a customer datacenter usage representative Hadoop benchmark "HcBench" which includes a mix of large number of customer representative interactive, query, machine learning, and transform jobs, a variety of data sizes, and includes compute, storage 110, and network intensive jobs, with inter-job arrival times as in a typical datacenter environment. We present the details of this benchmark and discuss application level, server and cluster level performance characterization collected on an Intel Sandy Bridge Xeon Processor Hadoop cluster.

Publication Title

2013 IEEE International Symposium on Workload Characterization (IISWC)

First Page

Last Page

DOI

10.1109/IISWC.2013.6704672

Recommended Citation

Saletore, V. A.; Krishnan, K.; Viswanathan, V.; and Tolentino, M. E., "HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark" (2013). School of Engineering and Technology Publications. 32.
https://digitalcommons.tacoma.uw.edu/tech_pub/32

This document is currently not available here.

Find in your library

COinS

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

First Page

Last Page

DOI

Recommended Citation

Browse

Author Corner

Links

SelectedWorks Sites

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

First Page

Last Page

DOI

Recommended Citation

Share

Browse

Author Corner

Links

SelectedWorks Sites