What you will Learn

All attendees will learn how to:

Understand Scala and its implementation
Apply Control Structures, Loops, Collection, and more.
Master the concepts of Traits and OOPS in Scala
Understand functional programming in Scala
Get an insight into the big data challenges
Learn how Spark acts as a solution to these challenges
Install Spark and implement Spark operations on Spark Shell
Understand the role of RDDs in Spark
Implement Spark applications on YARN (Hadoop)
Stream data using Spark Streaming API
Implement machine learning algorithms in Spark using MLlib API
Analyze Hive and Spark SQL architecture
Implement SparkSQL queries to perform several computations
Understand GraphX API and implement graph algorithms
Implement Broadcast variable and Accumulators for performance tuning

Course Outline

One

Introduction to Scala for Apache Spark

Learning Objectives

In this module, you will understand the basics of Scala that are required for programming Spark applications. You can learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

Topics

What is Scala? Why Scala for Spark? Scala in other frameworks, introduction to Scala REPL, basic Scala operations, Variable Types in Scala, Control Structures in Scala, Foreach loop, Functions, Procedures, Collections in Scala- Array, ArrayBuffer, Map, Tuples, Lists, and more.

6 Hours

Two

OOPS and Functional Programming in Scala

Learning Objectives

In this module, you will learn about object oriented programming and functional programming techniques in Scala.

Topics

Class in Scala, Getters and Setters, Custom Getters and Setters, Properties with only Getters, Auxiliary Constructor, Primary Constructor, Singletons, Companion Objects, Extending a Class, Overriding Methods, Traits as Interfaces, Layered Traits, Functional Programming, Higher Order Functions, Anonymous Functions, and more.

6 Hours

Three

Introduction to Big Data and Apache Spark

Learning Objectives

In this module, you will understand what is big data, challenges associated with it and the different frameworks available. The module also includes a first-hand introduction to Spark.

Topics

Introduction to big data, challenges with big data, Batch Vs. Real Time big data analytics, Batch Analytics Hadoop Ecosystem Overview, Real-time Analytics Options, Streaming Data Spark, In-memory data Spark, What is Spark?, Spark Ecosystem, modes of Spark, Spark installation demo, overview of Spark on a cluster, Spark Standalone cluster, Spark Web UI.

6 Hours

Four

Spark Common Operations

Learning Objectives

In this module, you will learn how to invoke Spark Shell and use it for various common operations.

Topics

Invoking Spark Shell, creating the Spark Context, loading a file in Shell, performing basic Operations on files in Spark Shell, Overview of SBT, building a Spark project with SBT, running Spark project with SBT, local mode, Spark mode, caching overview, Distributed Persistence.

6 Hours

Five

Playing with RDDs

Learning Objectives

In this module, you will learn one of the fundamental building blocks of Spark RDDs and related manipulations for implementing business logics.

Topics

RDDs, transformations in RDD, actions in RDD, loading data in RDD, saving data through RDD, Key-Value Pair RDD, MapReduce and Pair RDD Operations, Spark and Hadoop Integration-HDFS, Spark and Hadoop Integration-Yarn, Handling Sequence Files, Partitioner.

6 Hours

Six

Spark Streaming and MLlib

Learning Objectives

In this module, you will learn about the major APIs that Spark offers. You will get an opportunity to work on Spark streaming which makes it easy to build scalable fault-tolerant streaming applications, MLlib which is Spark’s machine learning library.

Topics

Spark Streaming Architecture, first Spark Streaming Program, transformations in Spark Streaming, fault tolerance in Spark Streaming, checkpointing, parallelism level, machine learning with Spark, data types, algorithmsstatistics, classification and regression, clustering, collaborative filtering.

6 Hours

Seven

GraphX, SparkSQL and Performance Tuning in Spark

Learning Objectives

In this module, you will learn about Spark SQL that is used to process structured data with SQL queries, graph analysis with Spark, GraphX for graphs and graph-parallel computation. You will also0 get a chance to learn the various ways to optimize performance in Spark.

Topics

Analyze Hive and Spark SQL architecture, SQLContext in Spark SQL, working with DataFrames, implementing an example for Spark SQL, integrating hive and Spark SQL, support for JSON and Parquet File Formats, implement data visualization in Spark, loading of data, Hive queries through Spark, testing tips in Scala, performance tuning tips in Spark, shared variables: Broadcast Variables, Shared Variables: Accumulators.

6 Hours

Eight

A complete project on Apache Spark

Learning Objectives

In this module, you will get an opportunity to work on a live Spark project where you can implement the learnings from previous modules hands-on, and solve a real-time use case.

Topics

6 Hours

Apache Spark Training

A brief summary

Why learn from Instant Brains?

What you will Learn

Course Outline

One

Introduction to Scala for Apache Spark

Learning Objectives

Topics

Two

OOPS and Functional Programming in Scala

Learning Objectives

Topics

Three

Introduction to Big Data and Apache Spark

Learning Objectives

Topics

Four

Spark Common Operations

Learning Objectives

Topics

Five

Playing with RDDs

Learning Objectives

Topics

Six

Spark Streaming and MLlib

Learning Objectives

Topics

Seven

GraphX, SparkSQL and Performance Tuning in Spark

Learning Objectives

Topics

Eight

A complete project on Apache Spark

Learning Objectives

Topics