BigData & Hadoop Testing Course

Introduction to Hadoop and its Ecosystem, MapReduce and HDFS

Introduction to Hadoop and its constituent ecosystem, understanding MapReduce and HDFS, Big Data, Factors constituting Big Data, Hadoop and Hadoop Ecosystem, Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency, Hadoop Distributed File System (HDFS) Concepts and its Importance, Deep Dive in Map Reduce – Execution Framework, Partitioner, Combiner, Data Types, Key pairs, HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow, Parallel Copying with DISTCP, Hadoop Archives

Hands on Exercises

Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads, Accessing HDFS from Command Line, Map Reduce – Basic Exercises, Understanding Hadoop Eco-system, Introduction to Sqoop, use cases and Installation, Introduction to Hive, use cases and Installation, Introduction to Pig, use cases and Installation, Introduction to Oozie, use cases and Installation, Introduction to Flume, use cases and Installation, Introduction to Yarn, Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive


How to develop Map Reduce Application, writing unit test, Best Practices for developing and writing, Debugging Map Reduce applications

Introduction to Pig & its features

What Is Pig?, Pig’s Features, Pig Use Cases, Interacting with Pig, Basic Data Analysis with Pig, Pig Latin Syntax, Loading Data, Simple Data Types, Field Definitions, Data Output, Viewing the Schema, Filtering and Sorting Data, Commonly-Used Functions, Hands-On Exercise: Using Pig for ETL Processing

Introduction to Hive

What Is Hive?, Hive Schema and Data Storage, Comparing Hive to Traditional Databases, Hive vs. Pig, Hive Use Cases, Interacting with Hive, Relational Data Analysis with Hive, Hive Databases and Tables, Basic HiveQL Syntax, Data Types, Joining Data Sets, Common Built-in Functions, Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

Hadoop Stack Integration Testing

Why Hadoop testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end to end tests, Functional testing, Release certification testing, Security testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing

Roles and Responsibilities of Hadoop Testing

Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion., ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation., User Authorization and Authentication testing (Groups, Users, Privileges etc), Report defects to the development team or manager and driving them to closure., Consolidate all the defects and create defect reports., Validating new feature and issues in Core Hadoop.

Framework called MR Unit for Testing of MapReduce Programs

Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop, Responsible for creating a testing Framework called MR Unit for testing of MapReduce programs.

Unit Testing

Automation testing using the OOZIE, Data validation using the query surge tool.

Test Execution of Hadoop _customized

Test plan for HDFS upgrade, Test automation and result

Test Plan Strategy Test Cases of Hadoop Testing

How to test install and configure

Hadoop Testing Projects

Project Work

Project 1 – Working with MapReduce, Hive, Sqoop

Problem Statement – It describes that how to import MySQL data using Sqoop and querying it using hive and also describes how to run the word count MapReduce job.

Project 2 – Hadoop Testing using MR

Problem Statement – It describes how to test MapReduce codes with MR unit.


Cloudera’s CDH (Installation, Management and monitoring using Cloudera Manager)

POC (proof of concept), Profile Upgradation, Interview Questions and Certification guidance.

Cloudera & Horton works Certification Assistance

Duration: 80 Hours (along with Hands-on)

Mail us @ for complete course content.

Big Data

Every Saturday @ 10am in JPA Solutions by Real time Cloudera Certified Professionals. Only limited participants, So kindly confirm, If you are interested to join this session

For more details, Reach us @ 8754415111 / 044-45002120