BigData & Hadoop Architect Course


Counters (Built In and Custom) – CustomInputFormat – Distributed Cache – Joins, (Map Side, Reduce Side)–Sorting- Performance Tuning –Generic Options Parser– Tool Runner- Debugging(Local Job Runner) - Yarn Architecture Details, MR Job Submission, Limitation in Version 1 with respect to processing.


Introduction to Spark - Spark Installation Demo - Overview of Spark on a cluster – Spark, Standalone Cluster - Spark RDD - Transformations in RDD - Actions in RDD – Persistence, in RDD - Loading Data in RDD - Saving Data through RDD - Key-Value Pair RDD – Map Reduce and Pair RDD Operations –Scala and Hadoop Integration - Spark SQL – Data Frame concept - SQLContext with example – JSON.

Example – Hive Context with Example and Spark SQL integration with Hive - Spark Streaming – DStream Concept - Spark Streaming Architecture and Abstraction –SocketTextStream API Example

MongoDB – 6hrs

Introduction to MongoDB - Overview, Design Goals, the Mongo Shell, JSON Intro – CRUD - Mongo Shell, Query Operators, Update Operators - Schema Design Patterns, Case Studies & Tradeoffs – Performance - Using Indexes, Monitoring And Understanding Performance – Aggregation - Goals, The Use Of The Pipeline - Application Engineering - Drivers, Impact Of Replication And Sharding On Design And Development – Use Cases

Cassandra – 6hrs

Introduction, RDBMS Vs Cassandra - Installation of single node – Architecture – Key spaces, CQL, CQLDatatypes, CRUD Operations - Reading and writing data

Storm – 6 hrs

What is Storm, Use Cases of Storm, Components of Storm, Properties of Storm, Storm Vs Hadoop. Storm Installation, Storm Running Modes, Creating First Storm Topology, Topologies in Storm Reliable Vs Unreliable Messages, Getting Data, Bolt Lifecycle , Bolt Structure, Reliable Vs Unreliable Bolts

Kafka – 6hrs

What is Kafka? Need for Kafka, Core Concepts of Kafka, Kafka Architecture, Where is Kafka Used? Understanding the components of Kafka Cluster, Installation of Kafka Cluster, Configuring Kafka Cluster, Producer of Kafka, Consumer of Kafka, Producer and Consumer in Action. Offset, Design, Hardware, Kafka Monitoring and Issues, Kafka Performance Tuning, Integration of Kafka with Hadoop

Impala – 6hrs

Impala Installation, Starting and Stopping Impala, Impala Shell Commands and Interface, Querying with Hive and Impala, Hive and Impala Query Syntax, Data Storage and File Format, Impala Architecture, Daemon, Stalestore, Catalog Services, Query Executing, Flow in Impala, Hive UDF with Impala


Cloud era and Horton works’ HDP (Installation, Management and Monitoring)

Cloudera & Horton works Certification Assistance

Duration: 80 Hours (along with Hands-on)

Mail us @ for complete course content.

Big Data

Every Saturday @ 10am in JPA Solutions by Real time Cloudera Certified Professionals. Only limited participants, So kindly confirm, If you are interested to join this session

For more details, Reach us @ 8754415111 / 044-45002120