Basic Machine Learning Training

Training Basic Machine Learning

Due to the COVID-19 outbreak our training courses will be taught via an online classroom.

Not sure how to start your Machine Learning journey or you just want to solidify your skills? With this training you will receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.

The Basic Machine Learning Training consists of 7 classes which cover Machine Learning fundamentals in Python with scikit-learn, Bash scripting with Linux, Data Handling & Visualization with R, and Statistics. 

These classes are perfect for companies of all sizes that want to close the data gap and train their employees. You can follow the schedule below in our offices or contact us for a tailor-made program that meets your needs.

Are you interested? Contact us and we will get in touch with you.

Get in touch for more information

Fill in the form and we will contact you about the Basic Machine Learning training:

* These fields are required.

What you will learn

This Basic Machine Learning training is a great fit if want to learn the basics to work as Data/Machine Learning Engineer or Data Scientist. Participants get most value out of the training when they have a background in analytics, mathematics and statistics.

During this training you will learn:

  • The leading Machine Learning technologies;
  • How to work with Python, R, Bash, scikit-learn;
  • The strenghts and limitations of the different technologies.

After the training you receive a Certificate of Completion. 

Training Dates

The Basic Machine Learning training consists of 7 classes which are spread over a couple of months to ensure the maximum learning curve. The content of the classes is connected, and in general we advise to attend all classes. In case you would like to attend (a) single class(es), contact us so we can give you the right advice about a tailored training course. 

Detailed description of the Classes

Click below to open a detailed description of the class: 

Linux CLI

Regardless of your OS of choice, knowing how to deal with Linux through the command line is a valuable skill to have for any engineer or scientist. To look under the hood of the application your deployed, to debug that job you had running on one of those nodes that keep crashing, or to simply prepare this dataset that will take longer to download, transform and upload again, being able to utilize the power of Bash will not only often save you, it will actually speed your work up! As with any power tool, it is of course also very easy to cut off your own foot, so join us on this journey towards getting to know Bash and unlocking its power.

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • Some concepts behind Linux
  • Everyday Bash tools
  • Tricks that will make Bash use easier
  • Basic Bash scripting
Data Preprocessing

As Data Scientists and Machine Learning experts spend a decent amount of time preprocessing, this topic is a necessary part in their toolkit. In this training we specifically focus on the pandas library, which has grown into one of the main tools for data preprocessing and exploration in Python, with many capabilities.

We start off with an introduction to preprocessing, the concept of tidy data and some useful techniques such as pivoting and missing value imputation. Then, we go into the pandas library, its background, data structures, and basic features. In a demo we get to see concrete ways to handle data sets, from loading, subsetting, merging, etc to (re)sampling, applying grouped transformations and saving results.

The training includes theory, demos, and hands-on exercises. After this training you have gained knowledge about:

  • The pandas library
  • Data structures: dataframes, series
  • Tidy data
  • Loading and saving data
  • Data exploration
  • Plotting time series
  • Useful transformation techniques
  • Merging, selecting, sorting, sampling
  • Missing value imputation
  • Grouped operations
  • Long/wide conversions
  • Advantages and limitations of pandas
Python ML Basics I

This training provides a theoretical introduction into the basics of Machine Learning and its different sub-fields, as well as a hands-on way of seeing how it is applied in practice. At the core of this training is the scikit-learn library, one of the most powerful and versatile tools for Machine Learning in Python. 

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • Machine Learning, its goals and potential applications
  • Different types of Machine Learning: supervised, unsupervised and reinforcement learning
  • Classification and regression problems
  • Techniques such as clustering and dimensionality reduction
  • A minimal example workflow of a prediction model in scikit-learn
  • Splitting datasets into training and test sets
  • How to train, predict and score a classification prediction model
  • The standard interfaces of scikit-learn classes
  • Transformers and estimators in scikit-learn
  • Some of the machine learning algorithms you’ll have at your disposal, such as k-nearest neighbors, logistic regression, support vector machines, neural networks, etc.
Python ML Basics II

In this training, we build upon what we have learned previously, and expand our workflow by showing how to optimize prediction models using Parameter Tuning. We discuss how and why to perform Cross-Validation and how to prevent Information Leakage. Bringing everything together, we finally show how to combine multiple steps of a machine learning workflow into Pipelines, thereby making the process more organized, efficient and less error-prone.

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • Cross-Validation
  • Commonly used Cross-Validation strategies
  • The importance of a validation set
  • Information leakage
  • Workflow of grid search and cross validation
  • Standard interfaces of the GridSearchCV class
  • Pipelines and their role in combining transformers and estimators


Python ML Basics III

In this training, we build upon what we have learned previously, and expand our knowledge of how to score machine learning models, discuss common pitfalls and show how to deal with them. We will do this by first examining the concepts of bias, variance, overfitting and underfitting, followed by diving into important performance metrics such as accuracy, precision, recall, F1 scores, ROC curves, etc for classification problems and elaborating on commonly used metrics for regression. This last part in our basic toolkit allows us to properly assess a prediction model that we train to recognize images of handwritten digits during the hands-on lab session.

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • Overfitting, underfitting. bias-variance tradeoff
  • Model evaluation in practice using sci-kit learn
  • Evaluation metrics for classification, such as accuracy, precision, recall, F1, area under curve
  • Interpreting confusion matrices, classification reports and ROC curves
  • Decision function and classification probabilities
  • Dealing with unbalanced datasets
  • Evaluation metrics for regression, such as MAE, RMSE, R^2
Data Handling & Visualization with R

R has grown into a well developed ecosystem with powerful packages for data analysis, data visualization, in-depth statistics, time series forecasting and machine learning, to mention a few. This training aims to give a quick-paced introduction of R, its most relevant features and basic workflow, including understanding how to apply them.

We start the training by discussing the basics of the R Programming language and its RStudio IDE, to understand its logic operations, data structures, workflow, etc. We then delve into a number of powerful packages such as dplyr, ggplot2, readr and other tidyverse packages and show how they are used for data preprocessing, analysis and visualization.

Finally, we apply these concepts and tools in practice during a hands-on lab session. We implement a complete data analysis workflow in R, from retrieving realtime earthquake data from a webservice to preprocessing, analyzing and eventually visualizing this data on an interactive map.

The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:

  • R Programming Basics
  • Packages: dplyr, ggplot2, readr, tidyverse, etc.
  • Working with Rmarkdown notebooks
  • Tips & conventions
  • Lab session to get hands-on experience with a complete data analysis workflow in R
Basic Statistics & Math

This training serves as a basic introduction to statistics.

We will first discuss a number of core concepts of statistics, from random variables, probabilities and distributions to expectation values, variance and conditional probabilities. We will show a couple of common distributions and examples to clarify these concepts. Then, we will go into statistical modelling with a  focus on linear regression. We conclude with some common metrics for regression and by talking about uncertainties in estimates.

In the lab exercises we then get to apply these concepts and do some modelling ourselves. The training includes theory, demos, and hands-on exercises. After this training you have gained knowledge about:

  • Basics of statistics and statistical modelling
  • Random variables
  • Probabilities, probability density functions, probability mass functions
  • Standard distributions such as Normal Distribution, Student’s T Distribution
  • Expectation and variance
  • Conditional probabilities
  • Statistical modelling definition and notations
  • Linear regression
  • Least squares estimate
  • Metrics: R^2, adjusted R^2, residual standard error
  • Confidence intervals, statistical tests
  • Lab sessions to get hands-on experience applying this knowledge