A BigBench Implementation in the Hadoop Ecosystem

Chowdhury, Badrul; Rabl, Tilmann; Saadatpanah, Pooya; Du, Jiang; Jacobsen, Hans-Arno

A BigBench Implementation in the Hadoop Ecosystem

Journal

Advancing Big Data Benchmarks - Proceedings of the 2013 Workshop Series on Big Data Benchmarking, WBDB.cn, Xi'an, China, July 16-17, 2013 and WBDB.us, San Jos\'e, CA, USA, October 9-10, 2013 Revised Selected Papers

Date Issued

2013

Author(s)

Chowdhury, Badrul

Rabl, Tilmann

Saadatpanah, Pooya

Du, Jiang

Jacobsen, Hans-Arno

Abstract

BigBench is the first proposal for an end to end big data analytics benchmark. It features a rich query set with complex, realistic queries. BigBench was developed based on the decision support benchmark TPC-DS. The first proof-of-concept implementation was built for the Teradata Aster parallel database system and the queries were formulated in the proprietary SQL-MR query language. To test other other systems, the queries have to be translated.
In this paper, an alternative implementation of BigBench for the Hadoop ecosystem is presented. All 30 queries of BigBench were realized using Apache Hive, Apache Hadoop, Apache Mahout, and NLTK. We will present the different design choices we took and show a proof of concept evaluation.

File(s)

Name

wbdb2013bigbench.pdf

Size

311.05 KB

Format

Adobe PDF

Checksum

(MD5):df75ebba02ca4fde5c52a59f11725c09

Options

A BigBench Implementation in the Hadoop Ecosystem