BigData & Hadoop Analytical Course

LINUX INTRODUCTION

  • File Handling, Text Processing, System Administration, Process Management
  • Archival, Network, File Systems, Advanced Commands

INTRODUCTION TO BIG DATA-HADOOP

Big Data (What, Why, Who) – 3++Vs – Overview of Hadoop Ecosystem - Role of Hadoop in Big data– Overview of other Big Data Systems – Who is using Hadoop – Hadoop integrations into Exiting Software Products - Current Scenario in Hadoop Ecosystem - Installation - Configuration - Use Cases of Hadoop (HealthCare, Retail, Telecom)

HDFS

Concepts - Architecture Daemon S, B lock Concept– Data Flow (File Read, File Write)–Fault Tolerance – Data Flow Archives – Coherency - Data Integrity – Role of Secondary Name Node, HDFS High Availability ,HDFS Federation Pseudo Distributed Hadoop Cluster Installation -Shell Commands – Java Base API

MAPREDUCE (Theory + Practical’s)

Theory – Data Flow (Map – Shuffle - Reduce) – Map Red vs. Map Reduce APIs - Programming [Mapper, Reducer, Combiner, Partitioner] – Writables – Input Format – Output format - Streaming API using python – Inherent Failure Handling using Speculative Execution – Magic of Shuffle Phase–File Formats –Sequence Files

ADMINISTRATION

Multi Node Hadoop cluster setup using Physical Machines- Hardware and Software Considerations - admin Commands (fsck, dfsadmin & rmadmin, safemode, distcp) – Job Scheduling in YARN (Fair Scheduler, Capacity Scheduler) - JobHistoryServer - Commissioning and Decommissioning (data node, node manager) - Reba lancer - Compression codecs.

HBASE

Introduction to NoSQL – CAP Theorem – Classification of NoSQL – Hbase and RDBMS – HBASE and HDFS- Architecture (Read Path, Write Path, Compactions, Splits) - Installation – Configuration - Role of Zookeeper – HBase Shell - Java Based APIs (Scan, Get, other advanced APIs )– Introduction to Filters- Row Key Design - Map reduce Integration – Performance Tuning –What’s New in HBase 0.98 – Backup and Disaster Recovery –Hands On.

HIVE

Introduction, Hive Vs RDBMS-Detailed Installation (Configuration, Metastore, Integrating with Hue) Starting Metastore and HiveServer2 - Data types (Primitive, Collection)-Create Tables (Managed, external) and DML operations (load,insert,export)-Managed Vs external tables-QL Queries (select, where, group by, having, sort by, order by)-Hive access through Hive Client, Beeline and Hue File Formats (RC, ORC, Parquet, Sequence)-Partitioning (static and dynamic),partition with external table, dropping partitions and corresponding configuration parameters.- Bucketing, Partitioning Vs Bucketing-Views, different types of joins (inner, outer)-Queries (Union, union all, intersection, minus)-Add files to the distributed cache, jars to the class path-UDF in java (Sample UDF and Ranking the data using UDF), GenericUDF, UDAF-Optimized joins (Map Side join, Bucketing join) Compressions on tables (LZO, Snappy)-Serde (CSVSerde, JsonSerde)-Parallel execution, Sampling data, Speculative execution-Hive - Hbase integration.

PIG

Introduction- Data types - Operators ( Arithmetic, relational, Diagnostic )- UDF (Java ) Functions ( Eval Functions and Load/Store Functions ) - Latin statements- Multi query execution Specialized joins- Optimized rules - Memory management - Extensive hands on with large datasets Trying all the above discussed theories in practical session.

HUE

Introduction, Installation, Access HIVE, PIG, Spark File Browser.

SQOOP

Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) – Connectors to Existing DBs and DW.

FLUME

Why Flume ? - Architecture, Configuration (Agents), Sources(Exec-Avro-Net Cat), Channels(File,Memory,JDBC, HBase), Sinks(Logger, Avro, HDFS, Hbase, File Roll), Contextual Routing (Interceptors, Channel Selectors) - Introduction to other aggregation frameworks

OOZIE

Architecture, Installation, Workflow, Coordinator, Action (Map reduce, Hive, Pig, Sqoop) –Introduction to Bundle – Mail Notifications

APACHE SPARK

Introduction to Spark - Spark Installation Demo - Overview of Spark on a Cluster - Spark Standalone Cluster - Spark RDD - Transformations in RDD – Actions in RDD - Persistence in RDD - Loading Data in RDD - Saving Data through RDD - Key-Value Pair RDD – Map Reduce and Pair RDD Operations – Scala and Hadoop Integration - Spark SQL – Data Frame concept - SQLContext with example - JSON example – Hive Context with Example and Spark SQL integration with Hive - Spark Streaming - DStream Concept - Spark Streaming Architecture and Abstraction - SocketTextStream API Example.

COMMERCIAL VENDOR SYSTEM:

Cloudera’s CDH (Installation, Management and monitoring using Cloudera Manager)

POC (proof of concept), Profile Upgradation, Interview Questions and Certification guidance.

Cloudera & Horton works Certification Assistance

Duration: 80 Hours (along with Hands-on)

Mail us @ info@jpasolutions.in for complete course content.

Big Data

Every Saturday @ 10am in JPA Solutions by Real time Cloudera Certified Professionals. Only limited participants, So kindly confirm, If you are interested to join this session

For more details, Reach us @ 8754415111 / 044-45002120