Looking to use Apache Flume to stream data to Hadoop? This complete reference guide shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components to their specific use-cases.
Using Flume describes the rich set of features that enable you to write data in any format supported by the Hadoop Distributed File System (HDFS) and HBase, including MapReduce, Hive, Impala, and Pig. You'll also learn about Flume's design and implementation, as well as various features that make Flume highly scalable, flexible, and reliable.
This book includes: In-depth explanations of how different Flume components work Detailed examples for customizing Flume using your own code Real-world examples on capacity planning, configuring, and deploying Flume Techniques for troubleshooting production issues and restoring a Flume cluster to full health
Über den Autor
Hari Shreedharan is a PMC Member and Committer on the Apache Flume Project. As a PMC member, he is involved in making decisions on the direction of the project. Hari is also a Software Engineer at Cloudera where he works on Apache Flume and Apache Sqoop. He also ensures that customers can successfully deploy and manage Flume and Sqoop on their clusters, by helping them resolve any issues they are facing. Hari completed his Bachelors from Malaviya National Institute of Technology, Jaipur, India and his Masters in Computer Science from Cornell University in 2010.
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.
Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.
* Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
* Dive into key Flume components, including sources that accept data and sinks that write and deliver it
* Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
* Explore APIs for sending data to Flume agents from your own applications
* Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running