BigData & Hadoop Analytical Course


  • File Handling, Text Processing, System Administration, Process Management
  • Archival, Network, File Systems, Advanced Commands


CLASS – 1: Introduction - Oops concept -(Object-Class-Inheritance-Polymorphism-Abstraction-Encapsulation)

CLASS – 2: String (Concept of String - Immutable String - String Comparison - String Concatenation - Concept of Substring - String class methods and its usage - String Buffer class, String Builder class)Exception Handling - (try - throw -catch ) Advance( throws - finally) Input and Output (I/O) function

CLASS – 3: Collections -(List Map Set) interface and its algorithm - Iterator interface -Map(hash map - tree map - linked hash map - multi key map) -List (array list - linked list) Set(Hash set - tree set ) Serialization – Deserialization.


Big Data (What, Why, Who) – 3++Vs – Overview of Hadoop Ecosystem - Role of Hadoop in Big data Overview of other Big Data Systems – Who is using Hadoop – Hadoop integrations into Exiting Software Products - Current Scenario in Hadoop Ecosystem - Installation - Configuration - Use Cases of Hadoop (HealthCare, Retail, Telecom)


Concepts - Architecture Daemon S, B lock Concept– Data Flow (File Read, File Write)–Fault Tolerance – Data Flow Archives – Coherency - Data Integrity – Role of Secondary Name Node, HDFS High Availability ,HDFS Federation Pseudo Distributed Hadoop Cluster Installation - Shell Commands – Java Base API


Theory – Data Flow (Map – Shuffle - Reduce) – Map Red vs Map Reduce APIs - Programming [Map per, Reducer, Combiner, Partitioner] –Writables – Input Format – Output format - Streaming API using python – Inherent Failure Handling using Speculative Execution – Magic of Shuffle Phase–File Formats – Sequence Files


Multi Node Cluster Setup using Physical Machines –Hardware Considerations –Software Considerations - Commands (fsck, job, dfsadmin) – Schedulers in Job Tracker - Rack Awareness Policy - Balancing - NameNode Failure and Recovery - commissioning and Decommissioning a Node– Compression Codecs


Introduction to NoSQL – CAP Theorem – Classification of NoSQL – Hbase and RDBMS – HBASE and HDFS- Architecture (Read Path, Write Path, Compactions, Splits) - Installation – Configuration - Role of Zookeeper – HBase Shell - Java Based APIs (Scan, Get, other advanced APIs )– Introduction to Filters- RowKey Design - Map reduce Integration – Performance Tuning –What’s New in HBase 0.98 – Backup and Disaster Recovery - Hands On


Introduction- Data types - Operators ( Arithmetic, relational, Diagnostic )- UDF (Java ) Functions (Eval Functions and Load/Store Functions ) - Latin statements- Multi query execution Specialized joins- Optimized rules - Memory management - Extensive hands on with large datasets Trying all the above discussed theories in practical session.


Introduction, Installation, Access HIVE, PIG, Spark File Browser.


Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) – Connectors to Existing DBs and DW.


Why Flume ? - Architecture, Configuration (Agents), Sources(Exec-Avro-Net Cat), Channels(File,Memory,JDBC, HBase), Sinks(Logger, Avro, HDFS, Hbase, File Roll), Contextual Routing (Interceptors, Channel Selectors) - Introduction to other aggregation frameworks


Architecture, Installation, Workflow, Coordinator, Action (Map reduce, Hive, Pig, Sqoop) Introduction to Bundle – Mail Notifications


Introduction to Spark - Spark Installation Demo - Overview of Spark on a cluster - Spark Standalone Cluster - Spark RDD - Transformations in RDD - Actions in RDD - Persistence in RDD - Loading Data in RDD - Saving Data through RDD - Key-Value Pair RDD – Map Reduce and Pair RDD Operations –Scala and Hadoop Integration - Spark SQL – Data Frame concept - SQLContext with example – JSON. Example – Hive Context with Example and Spark SQL integration with Hive - Spark Streaming –DStream Concept - Spark Streaming Architecture and Abstraction – SocketTextStream API Example


Cloudera’s CDH (Installation, Management and monitoring using Cloudera Manager)

POC (proof of concept), Profile Upgradation, Interview Questions and Certification guidance.

Cloudera & Horton works Certification Assistance

Duration: 80 Hours (along with Hands-on)

Mail us @ for complete course content.

Big Data

Every Saturday @ 10am in JPA Solutions by Real time Cloudera Certified Professionals. Only limited participants, So kindly confirm, If you are interested to join this session

For more details, Reach us @ 8754415111 / 044-45002120