Über den Autor
Mike Frampton has been in the IT industry since 1990, working in many roles (tester, developer, support, QA), and in many sectors ( telecoms, banking, energy, insurance). He has also worked for major corporations and banks, including IBM, HP, and JPMorgan Chase. The owner of Semtech Solutions, an IT/Big Data consultancy, Mike currently lives by the beach in Paraparaumu, New Zealand, with his wife and son.
Chapter 1: The Problem with Data
- Explain the big data problem
- Explain how Hadoop tools can help
- Explain my method of Hadoop tool use
- Explain how these tools fit together using a data warehouse as a metaphor
- Explain to people how using these tools can save them time and money while "futureproofing" their organizations.
Chapter 2: Storing and Configuring Data with Hadoop, Yarn, and ZooKeeper
- Provide a Hadoop platform overview
- Explain how Hadoop can be installed and configured
- Explain how Hadoop can be used via examples
- Explain configuration tools with examples
- Briefly explain the wider command set.
Chapter 3: Collecting Data with Nutch and Solr
- Explain how big data can be modified and imported into Hadoop
- Explain how ETL streams can quickly become very long and complex
- Explain the Hadoop collection tools with worked examples
Chapter 4: Processing Data with Storm, Pig, and Map Reduce
- Explain how big data can be processed using Hadoop tools
- Give examples of processing tool use and when and why they might be useful
- Show results and compare tools
Chapter 5: Scheduling Using Oozie
- 1. Explain how important scheduling is to system management
- 2. Explain monitoring and problem alerting
- 3. Explain the tools used via example
Chapter 6: Moving Data with Sqoop and Avro
- Explain the special problems that big data brings to data movement
- Explain the tools used to move big data
- Give worked examples for tool installation and use
Chapter 7: Monitoring the System with Chukwa, Ambari, and Hue
- Explain the need to monitor a big data system, which may contain millions of files
- Explain the systems and tools available to monitor
- Give worked examples for tool installation and use
Chapter 8: Analyzing and Querying Data with Hive and MongoDB
- Explain how to query data
- Explain the tools available to the analyst/manager/tester
- Show how to install and use analytics tools, with examples
Chapter 9: Reporting with Hadoop and Other Software
- Explain how you can assist management via reports
- Explain the tools Hadoop and other software provides
- Show to how to install reporting tools and use them, with examples
Chapter 10: Testing with Big Top
- Explain how to test a big data system
- Explain what testing tools are available
- Show how to install and use them, with examples
Chapter 11: Hadoop Present and Future
- Explain that data sizes will just keep growing
- Explain that financial and regulatory pressures will push for greater data retention
- Explain that this is already happening in the energy and banking sectors
- Explain how Hadoop, a free tool, will help solve these problems going forward
- Explain to readers that getting involved now could build them a new career and will certainly help their company now and in the future.
Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system.
As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (YARN and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive).
The problem is that the Internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decade-someone just like author and big data expert Mike Frampton.
Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to:Store bi
Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset is an introduction for developers and architects anyone else interested in big data to using the Apache Hadoop toolset. It includes a description of all tool capabilities as well as in-depth instructions to build and test a working system.