Building a HIPAA compliant self-serve data platform
Building reliable data pipelines, increasing total visibility and control over Fivetran, dbt, Snowflake, and Looker, all while prioritizing critical security requirements
The prelude...
Modern Health is a global mental health benefits platform, providing a comprehensive suite of digital 1:1 and group mental health care and well-being support to employees and their families.
We spoke with Solmaz Bagherpour, the lead data engineer on the Modern Health team, about their experience with adopting Prefect. All quotes referenced below are from Solmaz.
Separate data and infrastructure teams
Modern Health has a data team consisting of analysts, data scientists, and data engineers. This team is mandated with supporting Modern Health’s business initiatives and products, providing reliable, timely and high quality data and insights that will help to improve the quality of care and outcomes of their members. Examples include providing quality data to Modern Health teams for them to answer business questions:
- Anonymised and standardized outcomes of patients by geography
- Anonymised engagement within individual offerings & features
Accurate and timely information is key for Modern Health in order to provide client-facing analytics such as de-identified historical and usage statistics.
The data team at Modern Health is supported by the infrastructure team, whose responsibilities include providing the platform for the core Modern Health application as well as the infrastructure for the data team’s storage, processing, and movement of data.
Data stack
Modern Health’s data practice is built around a modern data stack:
- Infrastructure: Amazon Web Services
- ETL: Fivetran
- Data Transformation: dbt
- Data Warehouse: Snowflake
- Charts & Dashboards: Looker
- Data Science & Custom Analysis: Python
Given their privacy and regulatory requirements, Modern Health’s data team works with their dedicated infrastructure team that ensures that appropriate access and environment usage is maintained across all of their tools.
Before Prefect
The Modern Health Data Engineering team prioritizes reliability, scalability, and the ability to quickly spin up new data pipelines.
Reliability and observability
The Modern Health DE team had the ability to schedule interconnected jobs across their stack, and they were using EC2 Task Scheduler to do so. As Modern Health continued to grow, it became a priority to adapt an orchestration tool that would allow it to continue to provide timely and high quality data to their internal teams and clients. This meant that in order to continue effective management of all pipeline dependencies and runs, the need for an orchestration tool was becoming even more critical.
“Let’s take our custom ingestion scripts and our dbt jobs for example. Our dbt jobs were increasing by the day and so did the number of our custom API integrations. As we grew, we wanted to continue prioritizing observability, monitoring, and troubleshooting in the most effective way possible..”
With Prefect, they have a control panel where they can see exactly what needs to be triaged.
“Prefect gives us overall visibility into the impact on downstream systems. We have many flows that run at once or during the day, and with Prefect it is easier to oversee those, read the logs, and have increased visibility overall. ”
Building quickly
Without an orchestration tool, Modern Health’s data stakeholders had to be increasingly reliant on the Data Engineering team or be experts themselves on the data stack. Modern Health was growing and so were the pipelines that were needed to pull in new data. For example, ingesting data from an API that was not supported by Fivetran required AWS, Terraform, and Python knowledge to build those new pipelines. This led to increasing dependence on the Data Engineering team and long development cycles.
With adding Prefect, the data analysts could focus more on the data analysis itself rather than adding and troubleshooting pipelines and/or python code.
Security and privacy
As a healthcare company, Modern Health is regulated by a variety of agencies worldwide. This includes maintaining HIPAA compliance, as well as their commitments to their members around the use and security of their data.
These commitments are paramount for the Modern Health team when working with software and data, and was a key determinant for choosing Prefect over other tools.
“One reason we chose Prefect is because it supported how we use infrastructure. We are very serious about security, and the Prefect's architecture supports our strict controls around infrastructure and particularly protection of personal identifiable information (PII) and protected health information (PHI).”
Prefect’s deployment structure allowed Modern Health to build self-serve templates for their users, while remaining compliant with their strict infrastructure requirements. Prefect’s own architecture (our Hybrid Model) allowed the Modern Health team to keep their data and their code in their own environment.
Choosing Prefect
The Modern Health team decided to bring on an orchestration tool to handle automating their existing data pipelines and models, as well as the creation of new datasets. They cited two primary reasons for choosing Prefect.
“We decided to add an orchestration tool to increase robustness, visibility, and observability for our core data jobs. Our main goal was to prepare for an expected scaling and growth while empowering our data users to schedule and monitor their own data jobs. We also wanted to improve the data on-call engineer's quality of life with a centralized UI with built-in logging, alerting, and validating capabilities.”
After implementing Prefect
Since adopting Prefect, the Modern Health team has gained increased observability and reliability.
“Moving to Prefect was really a breakthrough regarding having increased control over all of our different, critical data flows. Being able to have the visibility we need, either in the dashboard or even with Slack notifications, is a gamechanger. The use of the automations feature in Prefect also provided us more control over our runtimes.”
Prefect has also allowed the team to move more quickly when building new pipelines, for a variety of user personas from analysts to data engineers.
“Prefect provides us the ability to design and execute on a growing number of pipelines, along with being able to build complex pipelines with a variety of dependencies.”
Prefect makes complex workflows simpler, not harder. Try Prefect Cloud for free for yourself, download our open source package, join our Slack community, or talk to one of our engineers to learn more.