Header menu link for other important links
X
Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase
Published in Springer International Publishing
2016
Pages: 191 - 220
Abstract
The Apache Hadoop is an open-source project which allows for the distributed processing of huge data sets across clusters of computers using simple programming models. It is designed to handle massive amounts of data and has the ability to store, analyze, and access large amounts of data quickly, across clusters of commodity hardware. Hadoop has several large-scale data processing tools and each has its own purpose. The Hadoop ecosystem has emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity and server class hardware and thereby providing cost-effective horizontal scalability. This chapter provides the introductory material about the various Hadoop ecosystem tools and describes their usage with data analytics. Each tool has its own significance in its functions in data analytics environment. © Springer International Publishing Switzerland 2016. All rights reserved.
About the journal
JournalData powered by TypesetData Science and Big Data Computing
PublisherData powered by TypesetSpringer International Publishing
Open AccessNo