arsalandywriter.com

<Title: Containerizing the Jaffle Shop dbt Project with Docker>

Written on

Running a Jaffle Shop dbt Project in Docker

This guide is tailored for those new to the data build tool (dbt) who may have encountered the Jaffle Shop project, a widely utilized example for testing scenarios.

The Jaffle Shop serves as a fictional e-commerce platform. This dbt initiative processes raw data from an application database into a structured model for customers and orders, suitable for analysis.

A Note on Jaffle Shop's Configuration Needs

One significant challenge with the Jaffle Shop project is its expectation that users, particularly those unfamiliar with dbt, will set up and manage a local database for the dbt models to materialize.

In this guide, I will walk you through how to establish a containerized version of the project using Docker. This approach will enable us to deploy a PostgreSQL instance and configure the dbt project to interact with that database. I will also provide a link to a GitHub repository that simplifies getting all necessary services operational quickly.

Creating the Dockerfile and docker-compose.yml

Let's start by outlining the services to run in Docker. First, we'll create a docker-compose.yml file where we will define two services: the PostgreSQL database and a custom service that we will define next with a Dockerfile.

# docker-compose.yml

version: "3.9"

services:

postgres:

container_name: postgres

image: postgres:15.2-alpine

environment:

  • POSTGRES_USER=postgres
  • POSTGRES_PASSWORD=postgres

ports:

  • 5432

healthcheck:

test: ["CMD-SHELL", "pg_isready -U postgres"]

interval: 5s

timeout: 5s

retries: 5

dbt:

container_name: dbt

build: .

image: dbt-jaffle-shop

volumes:

  • ./:/usr/src/dbt

depends_on:

postgres:

condition: service_healthy

This file specifies that we are using Docker Compose version 3.9. It outlines two services: postgres and dbt, each with specific configurations.

The postgres service leverages the official PostgreSQL Docker image version 15.2-alpine. It designates the container name as postgres, maps port 5432 (the default for PostgreSQL) to the host, and establishes environment variables for the PostgreSQL user and password. The healthcheck section includes a command to verify the container's health, with specified timeouts and retries.

The dbt service is based on a Docker image created from the current directory (using a Dockerfile). It mounts the current directory as a volume within the container and indicates that it depends on the postgres service, which must be healthy before dbt starts.

To containerize the Jaffle Shop project, we need to create a Dockerfile that installs the necessary dependencies for both Python and dbt and ensures that the container remains active once the environment is established.

# Dockerfile

FROM --platform=linux/amd64 python:3.10-slim-buster

RUN apt-get update

&& apt-get install -y --no-install-recommends

WORKDIR /usr/src/dbt

# Install the dbt Postgres adapter and dbt-core

RUN pip install --upgrade pip

RUN pip install dbt-postgres==1.2.0

RUN pip install pytz

# Install dbt dependencies and build models

CMD dbt deps && dbt build --profiles-dir ./profiles && sleep infinity

Configuring Postgres with dbt

To work with dbt, we’ll utilize the dbt Command Line Interface (CLI). A directory containing a dbt_project.yml file is recognized as a dbt project by the CLI.

We will set up this file to include basic configurations, such as the project name and the profile we will create in the next step. Additionally, we will define the paths for various dbt entities and provide information on their materialization.

# dbt_project.yml

name: 'jaffle_shop'

config-version: 2

version: '0.1'

profile: 'jaffle_shop'

model-paths: ["models"]

seed-paths: ["seeds"]

test-paths: ["tests"]

analysis-paths: ["analysis"]

macro-paths: ["macros"]

target-path: "target"

clean-targets:

  • "target"
  • "dbt_modules"
  • "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]

models:

jaffle_shop:

materialized: table

staging:

materialized: view

Next, the profiles.yml file will house the dbt profiles. A profile contains targets that outline the connection details and credentials for the database or data warehouse.

# profiles.yml

jaffle_shop:

target: dev

outputs:

dev:

type: postgres

host: postgres

user: postgres

password: postgres

port: 5432

dbname: postgres

schema: public

threads: 1

This file defines a profile named jaffle_shop that details connection attributes for a PostgreSQL database operating within a Docker container called postgres.

Profile Structure: - jaffle_shop: The profile name, chosen by the user. - target: dev: The default target for the profile, named dev. - outputs: Lists the output configurations for the profile, with the default output being dev. - dev: Specifies connection details for the dev target, which utilizes a PostgreSQL database. - type: postgres: Indicates the output type, which is a PostgreSQL database. - host: postgres: Specifies the address of the PostgreSQL database server. - user: postgres: The username for connecting to the PostgreSQL database. - password: postgres: The password for authenticating to the PostgreSQL database. - port: 5432: The port number for PostgreSQL connections. - dbname: postgres: The name of the PostgreSQL database to connect to. - schema: public: The schema name for executing queries against the database. - threads: 1: The number of threads for executing dbt tasks.

Jaffle Shop dbt Models and Seeds

The source data for the Jaffle Shop project comprises CSV files for customers, payments, and orders. In dbt, we can import this data into our database through seeds, which we then utilize to create dbt models.

Here’s an example of a model that computes metrics for our customers:

with customers as (

select * from {{ ref('stg_customers') }}

),

orders as (

select * from {{ ref('stg_orders') }}

),

payments as (

select * from {{ ref('stg_payments') }}

),

customer_orders as (

select

customer_id,

min(order_date) as first_order,

max(order_date) as most_recent_order,

count(order_id) as number_of_orders

from orders

group by customer_id

),

customer_payments as (

select

orders.customer_id,

sum(amount) as total_amount

from payments

left join orders on payments.order_id = orders.order_id

group by orders.customer_id

),

final as (

select

customers.customer_id,

customers.first_name,

customers.last_name,

customer_orders.first_order,

customer_orders.most_recent_order,

customer_orders.number_of_orders,

customer_payments.total_amount as customer_lifetime_value

from customers

left join customer_orders on customers.customer_id = customer_orders.customer_id

left join customer_payments on customers.customer_id = customer_payments.customer_id

)

select * from final

Running the Services via Docker

Now we will build and launch our Docker services. To do this, simply execute the following commands:

$ docker-compose build

$ docker-compose up

The above commands will initiate a PostgreSQL instance and subsequently build the dbt resources for the Jaffle Shop as specified in the repository. These containers will remain active, allowing you to:

  • Query the PostgreSQL database and the tables created from dbt models
  • Execute further dbt commands via the dbt CLI

Executing dbt Commands via CLI

The dbt container has already built the specified models. However, you can still access the container to execute dbt commands for new or modified models. Begin by accessing the container.

Use the command below to list all active containers:

$ docker ps

Copy the ID of the dbt container and use it in the following command:

$ docker exec -it <container-id> /bin/bash

This command provides access to the container's bash shell, allowing you to run dbt commands.

# Install dbt dependencies (not required if dbt_packages.yml is empty)

dbt deps

# Build seeds

dbt seeds --profiles-dir profiles

# Build data models

dbt run --profiles-dir profiles

# Build snapshots

dbt snapshot --profiles-dir profiles

# Run tests

dbt test --profiles-dir profiles

Keep in mind that since we’ve mounted the local directory to the running container, any changes made locally will be reflected within the container immediately. This means you can create new models or edit existing ones, then access the running container to build models, run tests, and so on.

Querying the dbt Models on the PostgreSQL Database

You can also query the PostgreSQL database and the dbt models or snapshots created within it. To do this, access the running PostgreSQL container.

# Get the container ID for the postgres service

$ docker ps

# Then, copy the container ID and enter the running container

$ docker exec -it <container-id> /bin/bash

Next, use psql, a terminal interface for PostgreSQL, to query the database:

$ psql -U postgres

The following commands can be used to list tables and views respectively:

postgres=# dt

List of relations

Schema | Name | Type | Owner

--------+---------------+-------+----------

public | customers | table | postgres

public | orders | table | postgres

public | raw_customers | table | postgres

public | raw_orders | table | postgres

public | raw_payments | table | postgres

(5 rows)

postgres=# dv

List of relations

Schema | Name | Type | Owner

--------+---------------+------+----------

public | stg_customers | view | postgres

public | stg_orders | view | postgres

public | stg_payments | view | postgres

(3 rows)

You can also query dbt models with a SELECT statement:

SELECT * FROM <table_or_view_name>;

Getting the Full Code

I’ve created a GitHub repository that you can clone to your local machine to swiftly run the containerized version of the Jaffle Shop dbt project. You can find the project and the code from this tutorial through the following link.

GitHub - gmyrianthous/jaffle_shop: A containerized version of the Jaffle Shop dbt project

This repository contains a containerized version of the popular Jaffle Shop dbt project published by dbt Labs. You can utilize this project…

Final Thoughts

The data build tool (dbt) is increasingly becoming an essential technology in contemporary data stacks. If you're just starting to learn dbt, I strongly encourage you to experiment with the Jaffle Shop project. It's a self-contained initiative developed by dbt Labs for testing and experimentation.

dbt is a tool frequently employed by data analysts and analytics engineers (alongside data engineers), necessitating a connection to a database or data warehouse. However, many analysts might feel uneasy about configuring and initializing a local database.

In this guide, we showcased how to get started with dbt and operate all the necessary services for materializing dbt models on a local PostgreSQL database. I hope this tutorial assists you in getting your dbt project and database operational as smoothly as possible. Should you encounter any challenges while running the project, feel free to reach out in the comments, and I’ll do my utmost to help you troubleshoot your code and configuration.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Harnessing Obsidian for Personal Growth in 2024: A Complete Guide

Discover how to leverage Obsidian for personal development and productivity improvements in 2024.

Exploring the Lavish Life of Saudi Arabia's Wealthiest Family

A glimpse into the opulent lifestyle of Saudi Arabia's royal family and the historical roots of their wealth.

The Golden Age of Learning: Seizing the Opportunity to Grow

Explore how to harness the current age of information for personal growth and skill mastery.

Blocking Certain Thoughts Reveals Our Fragility: Understanding Mental Barriers

Exploring the implications of suppressing thoughts and how meditation can foster mental resilience.

Are We All Descended From Just Two Ancestors?

Exploring the genetic connection we share with two ancient ancestors and what it means for humanity today.

Navigating Azure MSP Certification: Service Request Management

Understand the Service Request Management requirements for Azure MSP Certification and how Morpheus Data can help you meet these standards.

Exploring the Upsides and Downsides of Artificial Intelligence

This article delves into the advantages and disadvantages of AI, offering insights on its impact across various sectors.

Exploring the Underwater World: A Beginner's Guide to SCUBA Diving

Discover the essentials of SCUBA diving with valuable tips and insights for beginners looking to explore the underwater world.