Temporal and Pydantic AI: A Durable Human-in-the-Loop Research Demo

A practical walkthrough for running AI agent work inside durable Temporal workflows with Pydantic AI, FastAPI, and Next.js.

TL;DR

The full source code is available on GitHub: temporal-pydantic-ai-demo. Enjoy!

When building agentic systems, most discussions focus on familiar concepts.

Prompts
Tools
Model selection

This is sound advice, but it often stops short when it comes to asking important questions that need to be addressed in production-ready systems.

What Happens If…

a user provides an incomplete request?
the workflow needs clarification?
a third-party service becomes unavailable?
a worker process crashes halfway through execution.
the user’s browser gets refreshed while a long-running task is still in progress?

Anyone who’s worked on production systems knows that a considerable amount of effort goes into handling failures, retries, recovery, and all the edge cases that appear once software leaves the development environment.

None of this is insurmountable, but it’s easy to end up going down the rabbit hole of rolling your own solutions, and you can quickly spend more time and effort on mitigating against potential failure points than advancing the core purpose of your proposed solution.

What is Durable Execution?

Durable Execution changes the way long-running processes are modelled, and it’s not a new idea.

Companies like Netflix employ Durable Execution to coordinate workflows and maintain reliable state across complex multi-cloud architectures, where failures, retries, and system interruptions are both inevitable and costly.

Instead of treating workflow state as something the application needs to persist and recover manually, the workflow itself becomes durable.

If a worker process crashes, the workflow doesn’t disappear
If the application is redeployed, the workflow doesn’t disappear
If the workflow needs to wait for minutes, hours, or even days for external input, it can do so without tying up application resources

The workflow becomes a long-lived unit of work rather than being intrinsically tied to the original request. In other words:

The request and the workflow don’t live and die together

Durable Execution and AI

For AI systems, this is particularly relevant.

We’ve all seen the toy AI examples that fit neatly inside a request-response cycle.

1A user submits a prompt
2The model responds
3The request completes

This rosy picture soon evaporates if you spend any time contemplating AI workflows that might survive in real production environments.

If you’re implementing AI-enabled workflows, not only do you have to consider the age-old risks that always existed, you likely have additional potential failure points:

More third-party APIs to rely on
Human approval as the norm, not the exception
Slower completion times

Why Temporal?

Several Durable Execution platforms exist today, but one of the best known is Temporal.

They’ve built a strong reputation for running business-critical workflows at scale, and it’s become the platform many professionals associate with Durable Execution. If you need proof of that, do a Google search on companies that are using it.

Alongside market recognition, the thing that draws me to Temporal is how much complexity is treated as a platform concern rather than something every application has to implement independently.

Workflow history
Recovery
Retries
Waiting
Resuming
Observability

Those concerns don’t disappear, but they move into infrastructure designed specifically to solve them.

However, that doesn’t mean you don’t have to think about these things. Temporal provides systematic ways to deal with them, and a backend to perform the heavy lifting, but it does require a different way of thinking to piece it all together.

Adopting Temporal requires a different way of thinking.

There’s definitely a learning curve, but personally, I think that’s a worthwhile tradeoff when you realistically consider the alternative of rolling all the code yourself and thoroughly testing it.

Of particular interest to me is how Temporal forces you to separate workflow orchestration from side-effecting work.

This may feel like a steeper learning curve early on, but now I find it actually helps me think about the process more clearly.

What About DBOS?

Another framework worthy of attention is DBOS, which approaches Durable Execution from a different angle by building on PostgreSQL.

For some projects, particularly smaller systems or teams that want a lighter operational footprint, that can be a very attractive model. So much so, I’ve even previously written about PostgreSQL: The Swiss Army Knife for Agentic Databases.

I don’t see Temporal and DBOS as direct competitors where one is universally better, as they solve similar classes of problems while making different tradeoffs.

At the moment, if somebody asked me which platform is most likely to appear in enterprise Durable Execution discussions, I’d still point to Temporal.

That said, I think DBOS is well worth considering for smaller projects.

Building a Durable Research Workflow

When I first started exploring Temporal, I found their documentation and tutorials to be extremely thorough. However, I noticed the coverage was a little light when it comes to several frameworks I’ve come to depend on.

In order to provide myself with a reference implementation I could consult moving forward, I built a small multi-agent research application consisting of:

Temporal
Pydantic AI
FastAPI
Next.js
Tavily
OpenAI

I achieved this by taking several of the Temporal tutorial applications, combining them with an example directly from the team at Pydantic AI, and adding my own touches where appropriate.

To be clear, the goal was never to build the ultimate research agent, but to demonstrate Durable Execution in a way that’s easy to see and easy to test.

The application allows a user to:

Start a research request
Answer clarification questions
Monitor workflow progress
Read the final report in the browser
Download the report as a PDF

The research workflow itself is intentionally simple with the focus being on what happens around it.

The workflow can pause while waiting for human input
The browser can be refreshed without losing the active workflow
The worker process can be stopped and restarted
The workflow state survives because Temporal owns the execution history rather than the application process

Where Pydantic AI Fits

The workflow uses multiple Pydantic AI agents responsible for tasks such as:

Query triage
Clarification generation
Search planning
Search execution
Report generation

Pydantic AI provides a clean way to structure the agent behaviour while Temporal provides the durable orchestration layer around it.

In my opinion, this separation of concerns works extremely well, as the agents focus on reasoning and task execution while Temporal focuses on workflow lifecycle, recovery, state management, retries, and long-running execution.

They’re solving different problems, but they complement each other naturally.

The Durability Test

The easiest way to understand Durable Execution is to break something.

Start a workflow
Allow it to begin processing
Stop the worker process
Restart the worker

The workflow continues from its recorded history rather than starting from the beginning, and that’s the behaviour I wanted this project to demonstrate.

Starting and stopping the worker is a simple way to simulate an interruption during normal operation, but of course, you can easily simulate API failures, etc. by directly amending the code.

The end result is recoverability being a normal part of operating the software, rather than an edge case.

Repository

The full source code is available on GitHub: temporal-pydantic-ai-demo

The repository README contains complete setup instructions, configuration details, API documentation, and implementation notes.

Final Thoughts

Having spent time with LangGraph, Pydantic AI, Airflow, Ray, and now Temporal in different contexts, it’s been useful to see where Durable Execution fits in the broader orchestration landscape.

Agent frameworks are important.
Workflow orchestration is important.
Durable Execution solves a different problem again.

For workflows that need to survive failures, wait for human input, expose progress, and continue running beyond the lifespan of a single request, I increasingly think Durable Execution deserves far more attention than it currently receives.

This project was my way of exploring that idea with Temporal and Pydantic AI.

Temporal and Pydantic AI: A Durable Human-in-the-Loop Research Demo

Key Takeaways

AI workflows need more than a request-response loop

Temporal gives agent workflows durable state

Pydantic AI can run inside Temporal workflows

The research agent is a concrete example, not the main claim