Implementing your mesh plan
Where should your mesh journey start?
Moving to a dbt Mesh represents a meaningful change in development and deployment architecture. Before any sufficiently complex software refactor or migration, it's important to ask, 'Why might this not work?' The two most common reasons we've seen stem from
- Lack of buy-in that a dbt Mesh is the right long-term architecture
- Lack of alignment on a well-scoped starting point
Creating alignment on your architecture and starting point are major steps in ensuring a successful migration. Deciding on the right starting point will look different for every organization, but there are some heuristics that can help you decide where to start. In all likelihood, your organization already has logical components, and you may already be grouping, building, and deploying your project according to these interfaces.The goal is to define and formalize these organizational interfaces and use these boundaries to split your project apart by domain.
How do you find these organizational interfaces? Here are some steps to get you started:
- Talk to teams about what sort of separation naturally exists right now.
- Are there various domains people are focused on?
- Are there various sizes, shapes, and sources of data that get handled separately (such as click event data)?
- Are there people focused on separate levels of transformation, such as landing and staging data or building marts?
- Is there a single team that is downstream of your current dbt project, who could more easily migrate onto dbt Mesh as a consumer?
When attempting to define your project interfaces, you should consider investigating:
- Your jobs: Which sets of models are most often built together?
- Your lineage graph: How are models connected?
- Your selectors(defined in
selectors.yml
): How do people already define resource groups?
Let's go through an example process of taking a monolithing project, using groups and access to define the interfaces, and then splitting it into multiple projects.
To learn more, refer to our freely available dbt Mesh learning course.
Defining project interfaces with groups and access
Once you have a sense of some initial groupings, you can first implement group and access permissions within a single project.
- First you can create a group to define the owner of a set of models.
# in models/__groups.yml
groups:
- name: marketing
owner:
name: Ben Jaffleck
email: ben.jaffleck@jaffleshop.com
- Then, we can add models to that group using the
group:
key in the model's YAML entry.
# in models/marketing/__models.yml
models:
- name: fct_marketing_model
group: marketing
- name: stg_marketing_model
group: marketing
- Once you've added models to the group, you can add access settings to the models based on their connections between groups, opting for the most private access that will maintain current functionality. This means that any model that has only relationships to other models in the same group should be
private
, and any model that has cross-group relationships, or is a terminal node in the group DAG should beprotected
so that other parts of the DAG can continue to reference it.
# in models/marketing/__models.yml
models:
- name: fct_marketing_model
group: marketing
access: protected
- name: stg_marketing_model
group: marketing
access: private
- Validate these groups by incrementally migrating your jobs to execute these groups specifically via selection syntax. We would recommend doing this in parallel to your production jobs until you’re sure about them. This will help you feel out if you’ve drawn the lines in the right place.
- If you find yourself consistently making changes across multiple groups when you update logic, that’s a sign that you may want to rethink your groups.
Split your projects
- Move your grouped models into a subfolder. This will include any model in the selected group, it's associated YAML entry, as well as its parent or child resources as appropriate depending on where this group sits in your DAG.
- Note that just like in your dbt project, circular refereneces are not allowed! Project B cannot have parents and children in Project A, for example.
- Create a new
dbt_project.yml
file in the subdirectory. - Copy any macros used by the resources you moved.
- Create a new
packages.yml
file in your subdirectory with the packages that are used by the resources you moved. - Update
{{ ref }}
functions — For any model that has a cross-project dependency (this may be in the files you moved, or in the files that remain in your project):- Update the
{{ ref() }}
function to have two arguments, where the first is the name of the source project and the second is the name of the model: e.g.{{ ref('jaffle_shop', 'my_upstream_model') }}
- Update the upstream, cross-project parents’
access
configs topublic
, ensuring any project can safely{{ ref() }}
those models. - We highly recommend adding a model contract to the upstream models to ensure the data shape is consistent and reliable for your downstream consumers.
- Update the
- Create a
dependencies.yml
file (docs) for the downstream project, declaring the upstream project as a dependency.
# in dependencies.yml
projects:
- name: jaffle_shop
Best practices
- When you’ve confirmed the right groups, it's time to split your projects.
- Do one group at a time!
- Do not refactor as you migrate, however tempting that may be. Focus on getting 1-to-1 parity and log any issues you find in doing the migration for later. Once you’ve fully migrated the project then you can start optimizing it for its new life as part of your mesh.
- Start by splitting your project within the same repository for full git tracking and easy reversion if you need to start from scratch.
Connecting existing projects
Some organizations may already be coordinating across multiple dbt projects. Most often this is via:
- Installing parent projects as dbt packages
- Using
{{ source() }}
functions to read the outputs of a parent project as inputs to a child project.
This has a few drawbacks:
- If using packages, each project has to include all resources from all projects in its manifest, slowing down dbt and the development cycle.
- If using sources, there are breakages in the lineage, as there's no real connection between the parent and child projects.
The migration steps here are much simpler than splitting up a monolith!
- If using the
package
method:- In the parent project:
- mark all models being referenced downstream as
public
and add a model contract.
- mark all models being referenced downstream as
- In the child project:
- Remove the package entry from
packages.yml
- Add the upstream project to your
dependencies.yml
- Update the
{{ ref() }}
functions to models from the upstream project to include the project name argument.
- Remove the package entry from
- In the parent project:
- If using
source
method:- In the parent project:
- mark all models being imported downstream as
public
and add a model contract.
- mark all models being imported downstream as
- In the child project:
- Add the upstream project to your
dependencies.yml
- Replace the
{{ source() }}
functions with cross project{{ ref() }}
functions. - Remove the unnecessary
source
definitions.
- Add the upstream project to your
- In the parent project:
Additional Resources
Our example projects
We've provided a set of example projects you can use to explore the topics covered here. We've split our Jaffle Shop project into 3 separate projects in a multi-repo dbt Mesh. Note that you'll need to leverage dbt Cloud to use multi-project architecture, as cross-project references are powered via dbt Cloud's APIs.
- Platform - containing our centralized staging models.
- Marketing - containing our marketing marts.
- Finance - containing our finance marts.
dbt-meshify
We recommend using the dbt-meshify
command line tool to help you do this. This comes with CLI operations to automate most of the above steps.