Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Maheswari N; Sivagami M

doi:10.1007/978-3-319-31861-5_9

Profiles Research Units Publications

Book Chapter

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Published in Springer International Publishing

2016

DOI: 10.1007/978-3-319-31861-5_9

Pages: 191 - 220

Abstract

The Apache Hadoop is an open-source project which allows for the distributed processing of huge data sets across clusters of computers using simple programming models. It is designed to handle massive amounts of data and has the ability to store, analyze, and access large amounts of data quickly, across clusters of commodity hardware. Hadoop has several large-scale data processing tools and each has its own purpose. The Hadoop ecosystem has emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity and server class hardware and thereby providing cost-effective horizontal scalability. This chapter provides the introductory material about the various Hadoop ecosystem tools and describes their usage with data analytics. Each tool has its own significance in its functions in data analytics environment. © Springer International Publishing Switzerland 2016. All rights reserved.