Prefect Logo
Events

Building a Modular Data Architecture

November 12, 2024
Steven Johnson
Developer Advocate
Share

Picture a complex jigsaw puzzle where each piece represents a different data tool or platform. At first glance, these pieces might seem to overlap, leading many organizations to believe they must choose between them. But what if the real power lies not in choosing, but in combining these tools strategically? This was the core message of Alex Welch's presentation at the dbt/Prefect Data Event, where he demonstrated how dbt Cloud and Prefect can work together to create a robust, scalable data infrastructure.

The Evolution of Modern Data Architecture

The modern data stack emerged about a decade ago with the rise of cloud data providers and platforms. The core workflow became centered around a critical process: bringing data in from varied sources (raw databases, applications, audit logs), transforming that raw data into clean data models, and pushing it to downstream systems like AI and business intelligence tools.

As organizations grappled with this new paradigm, three key questions emerged:

  1. How can we automate workflows from data source to data warehouse?
  2. How can we gain visibility into pipeline performance and health?
  3. How can we increase data literacy across our company while maintaining trust in the data?

These challenges led to the emergence of specialized tools for orchestration, data observability, and data catalogs. While these solutions addressed immediate needs, they often created siloed approaches with unique interfaces and disconnected metadata—potentially leading to optimization issues and increased costs.

The Power of the Analytics Development Life Cycle (ADLC)

The solution to these challenges lies in what dbt's CEO Tristan Handy calls the Analytics Development Life Cycle (ADLC). This vendor-agnostic framework promotes collaboration across different stakeholders—from technical practitioners to decision-makers and analysts—through eight distinct workflows spanning planning, development, testing, and operations.

Rather than separating roles between data builders and consumers, the ADLC provides a standardized, repeatable framework for collaboration. This approach is complemented by what dbt calls the Data Control Plane, which sits across your stack and unifies capabilities while centralizing metadata across the business.

Modular Data Architecture: The Best of Both Worlds

The key to successful implementation lies in embracing modular data architecture—a design approach where infrastructure is broken down into independent, interchangeable components. Each component has a specific function and interacts with others through standardized interfaces. This approach offers several benefits:

  • Separation of concerns
  • Loose coupling between components
  • High cohesion within components
  • Standardized interaction interfaces

Real-World Implementation Examples

Consider these practical applications of combining dbt Cloud and Prefect:

  1. Basic Integration Flow
    • Prefect extracts data from an on-premise database
    • dbt Cloud performs transformations
    • Data is reloaded back to the on-premise database
  2. Advanced ML Pipeline
    • Prefect orchestrates the entire flow
    • dbt handles baseline transformations
    • Machine learning algorithms process the transformed data
    • Additional dbt jobs utilize ML outputs
    • BI tools are refreshed automatically

Empowering Diverse Teams

One of the most powerful aspects of combining dbt Cloud and Prefect is the ability to support different working styles and skill levels:

  • Data engineers can work in their preferred environments (VS Code, NeoVim)
  • Analysts can use dbt Cloud's intuitive IDE
  • Non-technical users can leverage the visual editor for drag-and-drop transformations

This flexibility allows organizations to meet practitioners where they are, enabling everyone to contribute to the data ecosystem without sacrificing governance or velocity.

The Future: Composable Customer Data Platforms

Looking ahead, the combination of dbt Cloud and Prefect enables organizations to build composable Customer Data Platforms (CDPs) that are:

  • Tailored to specific organizational needs
  • Flexible and scalable
  • Automatically documented and governed
  • Capable of handling frequent updates and changes

The power of modern data architecture isn't about choosing between tools—it's about combining them strategically. dbt Cloud and Prefect demonstrate how complementary tools can create solutions greater than the sum of their parts. By embracing this modular approach, organizations can build more robust, scalable, and accessible data infrastructures that serve the needs of all stakeholders.