Computational big-data systems: apache spark

Duration

4 - 8 hours

Audience

Data scientists, big data engineers, developers starting with big data.
This session is not a deep dive. If you are familiar with Apache Spark and/or distributed computing this session might be too high-level for you.

Course Objectives

Big Data and distributed computing are major parts of our offering.
In this session we’ll look at some fundamental concepts for distributed computing using Apache Spark, different big data file formats and why/when to use big data and best practices when using distributed computing frame- works such as Apache Spark.

Prerequisites

Some knowledge of programming, BI and data management (SQL, BI, ...).

Course Overview

The session will conclude with some examples in Apache Spark (the Python bindings for Apache Spark) that will give you a general idea how to use Spark in a basic setting.

 

This training in-company?

Upon your request we can organize this training for you.

 
 

interested to organize this training in-company?