Orchestrating Fashion's Data Runway: Inside Rent the Runway's Data Stack with Prefect and dbt

November 19, 2024

Vishal Kella

Staff Engineer, Rent The Runway

Steven Johnson

Developer Advocate

The Fashion Data Challenge

Rent the Runway's business model presents unique data challenges. Beyond the customer-facing rental platform, the company manages complex warehouse operations, marketing initiatives, and financial data streams. Their data engineering team handles ingestion and processing of everything from inventory tracking to customer behavior analysis, requiring a robust and flexible data stack.

The Core Data Stack

At the heart of RTR's data architecture lies a powerful combination of tools:

Prefect for orchestration
dbt for data transformation
Snowflake as their data warehouse
AWS as their cloud provider
Docker for containerization
GitHub Actions for CI/CD

The company follows an ELT (Extract, Load, Transform) paradigm, with Prefect and dbt serving as the backbone of their data operations.

The Prefect-dbt Symphony

What makes RTR's implementation particularly interesting is how they've integrated Prefect and dbt. The data engineering team has created a parameterized Prefect flow that serves as their universal dbt runner. This flow handles:

ECS metadata collection for observability
Command composition for dbt operations
Execution of dbt commands in the correct sequence
Upload of metadata files and documentation to S3
Integration with their data catalog

The scale of their operation is impressive: approximately 30-40 dbt projects containing around 1,800 models, all orchestrated through Prefect.

Real-World Use Cases

Google Sheets Integration

One elegant example of their Prefect-dbt integration involves ingesting data from Google Sheets. Business stakeholders can input data into spreadsheets, which Prefect automatically ingests into Snowflake. dbt then applies quality tests using the dbt-expectations package before the data flows into downstream models.

Critical Data Monitoring

Another notable implementation monitors their membership services data. The team uses tagged critical tests in dbt, executed through Prefect, to ensure data quality. When issues arise, the system automatically notifies relevant teams via Slack and email, creating a proactive data quality monitoring system.

Technical Evolution

RTR is currently migrating from Prefect 1 to Prefect 2, while simultaneously upgrading their dbt implementation from version 1.4 to 1.8. This transition has brought several benefits:

Simplified deployments in Prefect 2
Enhanced data contracts through dbt 1.8
Improved model access controls
Better cross-project dependency management

Looking Forward

The team is focused on several key areas for improvement:

Consolidation: Reducing their 30+ dbt projects to a more manageable number
Observability: Implementing better monitoring of pipeline performance and test results
Performance Optimization: Analysis of model execution times and resource utilization

Key Takeaways for Implementation

For teams looking to implement a similar stack, Kella offers valuable advice:

Consider Maintenance Costs: While custom solutions offer flexibility, they require ongoing maintenance. Evaluate whether official integrations might better serve your needs.
Focus on Observability: Build systems that provide clear insights into pipeline performance and data quality.
Stay Flexible: Create parameterized flows that can handle various use cases, from running tests to generating documentation.

Rent the Runway's implementation of Prefect and dbt showcases how modern data tools can be combined to handle complex business requirements. Their approach demonstrates that successful data infrastructure isn't just about choosing the right tools—it's about thoughtfully integrating them to create reliable, maintainable, and scalable data operations.

By balancing custom solutions with standardized tools, RTR has built a data stack that can handle everything from simple spreadsheet ingestion to complex data transformations, all while maintaining data quality and observability. As they continue to evolve their stack, their experience offers valuable lessons for other organizations looking to scale their data operations.