Workflow Orchestration

No Flow is an Island

August 20, 2023
Bill Palombi
Head of Product
Share

Data engineers love Prefect not only because it makes workflows more robust, but also because it makes them easier to understand. In Prefect, a workflow is a process with discrete steps, tasks. Prefect makes the dependencies between tasks clear. Any task that consumes the result of, or waits for the completion of, a previous task is dependent on that task. Dependencies can be easily specified in code and visualized in a web browser. Seeing a graph of a flow, its tasks & subflows, and relationships between them offers a “big picture” view of the process in a way that the code itself never could.

Hidden dependencies in plain sight

The dependencies between tasks in workflow, or flow, are explicit. But when even the simplest flow runs, it has many other implicit, often invisible, dependencies on external systems.

First of all, the flow code has to actually run somewhere. Even if the execution infrastructure is abstracted away, as it is for serverless execution, a flow still depends on the infrastructure it runs on to work.

Once it’s running, if your flow does anything interesting, it connects to something in the outside world - a database, a file storage system, a web service - probably lots of them. You may not think of these as you would a dependency between tasks, but they are just as real. In fact, they're probably the greatest source of risk to your flow’s completion.

Ask yourself, do your flows fail more often from a code issue or an external system issue? For an orchestrator to be useful, it must provide observability into aspects of the data stack beyond the code it executes.

Observe external systems with events

Prefect’s new events system enables true workflow observability. Events are exactly what they sound like - a description of something that happened at a specific point in time. They’re simple and flexible. Today, there are two primary ways that Prefect uses events and associates them with flow runs.

Infrastructure dependencies

Prefect’s deployments specify the infrastructure that a flow depends on to run. Events make this dependency visible when the infrastructure behaves as expected and, more importantly, when it doesn’t.

For example, say you’ve deployed a flow as a Kubernetes job, which creates a Kubernetes pod for each flow run. When a flow runs, it depends on its Kubernetes pod. Previously, if your flow run crashed unexpectedly, you’d have to make any associations between Kubernetes and Prefect yourself. Even if you have a good way to explore your Kubernetes logs, there’s still guesswork in trying to understand which logs are relevant to which flow run. You could easily lose hours trying to identify the root cause across numerous browser tabs and terminal windows.

With Prefect’s worker events, when Prefect sets up infrastructure resources for your workflow, it automatically adds relevant context aware telemetry back to Prefect. In the case of a Kubernetes job, pod events (a native Kubernetes concept) are forwarded to Prefect for display alongside the flow run timeline execution visualization. There’s no more guessing game. You’ll see the pod eviction events and Kubernetes worker logs right in the Prefect UI, directly associated with the impacted flow run.

Integration dependencies

Prefect Blocks, connectors for code, are already the best way to interface with external systems from flow code. With events, they’re even more powerful. Every block exposes methods that perform actions against the system to which they connect. Now, every block method call is an event associated with the flow run.

Blocks can be used both inside and outside of tasks, so you don’t have to wrap a function that calls out to an external system in a task to observe it.

Say you have a simple flow that reads a file from an AWS S3 bucket, does a basic transformation, and writes another file back to the same bucket. Since it’s a simple flow, it’s not broken down into tasks. It runs just fine until, one day, it fails. You notice that the failure occurred just after an event indicating that read_path() method on the AWS S3 block was called. You should probably start troubleshooting by making sure that the expected file is there.

Like worker events that describe the observed infrastructure context of execution be able to see block events not only in the context of a particular flow run, but also from the perspective of a particular block. Imagine that your database credentials expired. With block events, you’ll be able to visit the page for the database credentials block and see all of the flow runs that attempted to use the expired credentials.

A Bigger Big Picture

Prefect has always made workflows self-contained and understandable by making the dependencies between their tasks and subflows explicit. But no workflow is an island. Even the simplest flows have implicit, invisible dependencies on external systems. Prefect’s events make implicit dependencies visible, broadening your understanding and awareness for every flow run, and enabling observable, reactive, resilient workflows.

Prefect makes complex workflows simpler, not harder. Try Prefect Cloud for free for yourself, download our open source package, join our Slack community, or talk to one of our engineers to learn more.