Applied Big Data Training

Training Applied Big Data

Due to the COVID-19 outbreak our training courses will be taught via an online classroom.

Not sure how to start your Big Data journey or you just want to solidify your skills? With this training you will receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.

Learn how to utilize the power of Spark, Kafka and Hadoop. We will give a deep dive into the Big Data skills and technology that any Data Engineer should possess. After finishing the training program you will have an overview of and hands on experience with different data architectures. 

These classes are perfect for companies of all sizes that want to close the data gap and train their employees. You can follow the schedule below in our offices or contact us for a tailor-made program that meets your needs.

Are you interested? Contact us and we will get in touch with you.

Get in touch for more information

Fill in the form and we will contact you about the Applied Big Data training:

* These fields are required.

What you will learn

This Applied Big Data training is a great fit if you work as software, data, or BI engineer. In order to get most value out of the training, participants should have very basic knowledge of Python and SQL.

During this training you will learn:

  • How to use Big Data processing and analytics;
  • Understanding Big Data platforms and the setup;
  • The strengths and weaknesses of the different technologies.

After the training you receive a Certificate of Completion. 

Training Dates

This Applied Big Data training consists of 9 classes which are spread over a couple of months to ensure the maximum learning curve. The content of the classes is connected, and in general we advise to attend all classes. In case you would like to attend (a) single class(es), contact us so we can give you the right advice about a tailored training course. 

Join the Big Data Training on these dates in our office or contact us for a tailored training:

Class

  • Hadoop Essentials
  • Spark 1
  • Spark 2
  • Spark 3
  • Kafka
  • Streaming
  • Spark 4
  • Hive
  • Data Architectures

Available Dates

February 21, 2020  |  June 12, 2020
February 28, 2020  |  June 19, 2020
April 17, 2020  |  July 31, 2020
April 24, 2020  |  August 7, 2020
June 12, 2020  |  September 18, 2020
June 19, 2020  |  September 25, 2020
June 26, 2020  |  October 2, 2020
July 24, 2020  |  October 30, 2020
July 31, 2020  |  November 6, 2020

Detailed description of the Classes

Click below to open a detailed description of the class: 

Hadoop Essentials

The creation of Hadoop is often considered as the start of the Big Data ‘movement’. What problems does Hadoop tackle and how? This training provides insights into the software and its most important technologies.

This training includes theory and practical exercises. After this training you will have gained knowledge about:

  • The Hadoop ecosystem
  • Data storage (HDFS)
  • Cluster resource management (YARN)
  • Data processing (MapReduce and Spark).
  • Data pipelines in Hadoop (Oozie)
Spark I

In this first Apache Spark training we will introduce basic Spark concepts and the Resilient Distributed Datasets (RDD) API that is core to Apache Spark. During the practical session participants will use RDD API from Python to analyze a MovieLens dataset.

The training includes theory and hands-on exercises. After this training you will have gained knowledge about:

  • Spark concepts, roots and history

  • How Spark relates to Hadoop

  • How Spark solves challenges in concurrent and parallel programming

  • Spark RDDs and the RDD API

  • Spark deploy modes

Spark II

In the second Apache Spark training you will be introduced to Spark’s Dataframe API and Spark SQL. These APIs are optimized for dealing with structured data, tabular data, and allow SQL access to very large datasets. During the practical session participants will be introduced to the APIs and then work on analyzing MovieLens dataset using Spark SQL. 

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge on 

  • Spark’s Dataframe API 
  • Spark SQL 
  • The Parquet storage format  
Spark III

In the third Apache Spark training you will be introduced to Machine Learning concepts with Spark’s MLlib API as well as how to apply them at scale. During the practical sessions participants will work on a Recommender System and on predicting airplane delays.

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • Basic machine learning concepts
  • Spark MLlib
  • Pipelines in Spark
  • Building a basic Recommender System in Spark
  • Using Spark and machine learning for predictions
Spark IV

In the fourth Spark training you will be introduced to Spark’s structured streaming APIs. Participants are introduced to streaming concepts such as event time, late data, windowing, and watermarking. During the practical session participants will solve several streaming queries regarding order (sales) data using Spark and Kafka. 

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:  

  • Previous and current streaming APIs in Spark 
  • Spark structured streaming data model 
  • Considerations concerning streaming query output modes 
  • Event time and late data 
  • Windowing and watermarking to solve late data issues 
  • Hands-on solving structured streaming queries 
Kafka

The Kafka training aims to provide an overview of the Apache platform. Participants will learn about Kafka terminology and how Kafka provides a scalable solution for decoupling data streams. Topics such as partitioning and message guarantees will be addressed. During the practical session participants will use a Dockerized Kafka broker to explore basic consuming and producing followed up by a more complex change data capture (CDC) scenario.

The training introduces Kafka concepts and theory followed up by hands-on exercises. After this training you will have gained knowledge about:

  • The problems Kafka solves
  • Kafka terminology and internals
  • Partitioning and scaling Kafka
  • The various message guarantees provided by Kafka
  • Kafka security and ACL options
  • Schemas and schema registry
  • Basic Kafka consuming and producing
  • Change data capture and Kafka
Streaming

The (software) world is becoming more and more event based, which translates into data processing moving from a batch to a streaming paradigm. This class covers several platforms that deal with streaming data.

Apache Hive

Apache Hive allows you to use SQL to query files in (distributed) file systems which is a cornerstone of many big data platforms. This training provides insights into the different Hive components and how to use them to query JSON, CSV, Parquet (…) files. 

The training includes theory and hands-on exercises. After this training you will have gained knowledge about:

  • Hive Metastore - holds table definitions
  • Hive Query Engine - used to perform SQL queries
  • Ways to optimize table layout using data partitioning and clustering.
  • Spark integration with Hive
  • Impala - the SQL engine in Cloudera Hadoop
Data Architectures

Learn how to setup different (big) data architectures and the design principles behind them and the trade-offs between them.

For more information or to book your training

Are you interested in the Big Data training or do you have questions? Fill out the form and we will contact you personally.