Computational big-data systems: apache spark
4 - 8 hours
Data scientists, big data engineers, developers starting with big data.
This session is not a deep dive. If you are familiar with Apache Spark and/or distributed computing this session might be too high-level for you.
Big Data and distributed computing are major parts of our offering.
In this session we’ll look at some fundamental concepts for distributed computing using Apache Spark, different big data file formats and why/when to use big data and best practices when using distributed computing frame- works such as Apache Spark.
Some knowledge of programming, BI and data management (SQL, BI, ...).
The session will conclude with some examples in Apache Spark (the Python bindings for Apache Spark) that will give you a general idea how to use Spark in a basic setting.
This training in-company?
Upon your request we can organize this training for you.