Apache Spark Ver 2.x using Java Training

Description

Apache Spark Ver 2.x using Java

Course Duration: 3 Day(s)

Objectives

  • Apache Spark architecture
  • Resilient Distributed Datasets
  • Working with Spark in interactive and Java programming mode
  • Operations with RDDs, Pair RDD
  • Persisting and Caching RDDs
  • Spark Master and Worker in distributed mode
  • Spark SQL: Dataframe and TempTables
  • Spark Streaming
  • Analytics and Machine Learning in Spark
  • Tips and tricks of improving Spark performance

Highlights

  • Module 1: Introduction to Big data eco system
  • Module 2: Spark Basics and Architecture
  • Module 3: Working with RDDs in Spark
  • Module 4: Aggregating data with Pair RDDs
  • Module 5: More on Spark
  • Module 6: Distributing Apache Spark
  • Module 7: Spark Data Processing: Spark SQL
  • Module 8: More on Spark SQL
  • Module 9: Basics of Spark Streaming
  • Module 10: More on Spark Streaming
  • Module 11: Analytics and Machine Learning in Spark
  • Module 12: Improving performance of Spark

Who Should Attend

Development experience of at least 2 years on Java Platform and good understanding of HDFS & MapReduce/Hadoop platform.

Target Audience

Data Engineer & Data Analyst

Personas

Data Engineer & Data Analyst