Udacity: Data Engineer

(2 customer reviews)

Learn to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets. At the end of the program, you’ll combine your new skills by completing a capstone project.

Who is this course for?

This Nanodegree program offers an ideal path for experienced programmers to advance their data engineering career. If you enjoy solving important technical challenges and want to learn to work with massive datasets, this is a great way to get hands-on practice with a variety of data engineering principles and techniques.

Course Syllabus

Data Modeling

Learn to create relational and NoSQL data models to fit the diverse needs of data consumers. Use ETL to build databases in PostgreSQL and Apache Cassandra.

Data Modeling with Postgres

In this project, you’ll model user activity data for a music streaming app called Sparkify. You’ll create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL you will also define Fact and Dimension tables and insert data into your new tables.

Data Modeling with Apache Cassandra

In this project, you’ll model user activity data for a music streaming app called Sparkify. You’ll create a noSQL database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. You’ll model your data in Apache Cassandra to allow for specific queries provided by the analytics team at Sparkify.

Cloud Data Warehouses

Sharpen your data warehousing skills and deepen your understanding of data infrastructure. Create cloud-based data warehouses on Amazon Web Services (AWS).

Build a Cloud Data Warehouse

In this project, you are tasked with building an ETL pipeline that extracts their data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for their analytics team to continue finding insights in what songs their users are listening to.

Spark and Data Lakes

Understand the big data ecosystem and how to use Spark to work with massive datasets. Store big data in a data lake and query it with Spark.

Build a Data Lake

In this project, you'll build an ETL pipeline for a data lake. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app. You will load data from S3, process the data into analytics tables using Spark, and load them back into S3. You'll deploy this Spark process on a cluster using AWS.

Data Pipelines with Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Data Pipelines with Airflow

In this project, you’ll continue your work on the music streaming company’s data infrastructure by creating and automating a set of data pipelines. You’ll configure and schedule data pipelines with Airflow and monitor and debug production pipelines.

Capstone Project

Combine what you've learned throughout the program to build your own data engineering portfolio project.

Data Engineering Capstone

The purpose of the data engineering capstone project is to give you a chance to combine what you've learned throughout the program. You'll define the scope of the project and the data you'll be working with. You'll gather data from several different data sources; transform, combine, and summarize it; and create a clean database for others to analyze.

Enrollment Inclusions

Real-world projects from industry experts

With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.

Technical mentor support

Our knowledgeable mentors guide your learning and are focused on answering your questions, motivating you, and keeping you on track.

Career services

You’ll have access to resume support, Github portfolio review, and LinkedIn profile optimization to help you advance your career and land a high-paying role.

Flexible learning program

Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.

Additional information

Course Page


Program Length

Estimated Time Of 5 Months At 5-10 hrs/week


Amanda Moran, Ben Goldberg, Sameh El-Ansary, Olli Iivonen, David Drummond, Judit Lantos, Juno Lee

Scheduled Class Batches?


Program Format

Self-paced Online Classes

Technical or Skill Pre-requisites

Intermediate Python programming knowledge, of the sort gained through the Programming for Data Science Nanodegree program, other introductory programming courses or programs, or additional real-world software development experience. Including:
– Strings, numbers, and variables; statements, operators, and expressions;
– Lists, tuples, and dictionaries; Conditions, loops;
– Procedures, objects, modules, and libraries;
– Troubleshooting and debugging; Research & documentation;
– Problem-solving; Algorithms and data structures


Monthly Access – Pay as you go: $399 per month
– Learn at your own pace
– Cancel anytime
3-Month Access – Pay upfront and save an extra 15%: $1695
– Switch to the monthly price after if more time is needed.
– Cancel anytime.

Financing Options


Scholarship Programs


2 reviews for Udacity: Data Engineer

  1. isleepbad

    They recently had the free 1 month offer again and I’m doing it now. Honestly for me it’s an extremely mixed bag. I’m a beginner data engineer just transitioning into the field and I’ve had my own personal project going for 4 months. However, lot of the content filled in the gaps for me to help me understand things, like data modeling, different architectures and how to properly use data lakes.

    But their exercises are really garbage though. A lot of stuff I am doing ony own the “”harder”” way. Their IaC section? I did using Terraform. Their free data sets I downloaded and messed around with them using my own copy of the same tools. Used docker to spin up containers and interacted with them.

    All in all for a beginner lacking some concepts it’s great. But you’re better off doing your own projects than using their examples and thinking you learned anything.

  2. mltut

    Pros and Cons of Udacity Data Engineering Nanodegree

    Provides hands-on labs to practice throughout each lesson.
    The content is well developed and intuitive.
    Provides a good explanation of SQL vs. NoSQL.
    Discuss Postgres and Apache Cassandra commands.
    Provides perfect exposure to skills required in the data engineering industry.
    Focus on hands-on practice and believe in “how” to do things like ETL and Data Warehousing.
    Good explanation of distributed file systems and cluster computing.
    Clears the doubt between PySpark data frames and PySpark SQL.
    Provides Technical mentor support.
    Great community to help.

    Some of the lectures are not very polished.
    Data Modeling exercises have bugs.
    The demonstration code sample is not available to students.
    After completing the Nanodegree program, you will not get lifetime access to the course material.”

Add a review

Your email address will not be published. Required fields are marked *