About The Training
Apache Hadoop is an Open Source software framework for distributed storage and processing of very large data sets.
What Will It Offer?
- The training will help you understand how Apache Spark processes large data sets across clusters of computers to do the distributed processing using simple programming models.
- Learn why Apache Spark can be used for single machine as well as for number of machines, each of which offers local computation and storage.
- Learn how to detect and handle failures at application layer rather than relying on hardware to deliver high-availability.
- Hadoop Introduction
- Hadoop Components
- Hadoop Distributed File System
- MapReduce
- MapReduce Programming
- Hadoop Data I/O
- Hadoop Cluster
- Advanced MapReduce
- Hadoop on AWS Cloud
- Managing Hadoop
- Testing & Debugging
- Hadoop Security
- Big Data
- Sqoop
- Hbase
- Hbase & MapReduce
- Hive
- Pig
- Avro
- ZooKeeper
- Cassandra
- Mahout
- Ambari
- Hbase
- Chukwa
- Integration of Components with Other components
- Case Studies
- Best Practices