Skip to content
Ian Cunningham monogram Ian Cunningham AI systems builder

Blog

Temporal and Pydantic AI: A Durable Human-in-the-Loop Research Demo

A practical walkthrough for running AI agent work inside durable Temporal workflows with Pydantic AI, FastAPI, and Next.js.

Temporal and Pydantic AI: A Durable Human-in-the-Loop Research Demo
KT

Article summary

Key Takeaways

  1. AI workflows need more than a request-response loop

    Once an agent workflow takes multiple steps, calls external services, waits for user input, or needs to survive process restarts, normal HTTP request handling stops being enough.

  2. Temporal gives agent workflows durable state

    Temporal keeps workflow history separate from worker processes, which means a worker can stop and restart without losing where the research job was.

  3. Pydantic AI can run inside Temporal workflows

    This demo uses Pydantic AI agents through Temporal-compatible wrappers, so model calls and tool work can participate in a durable workflow execution.

  4. The research agent is a concrete example, not the main claim

    The point of this repo is durable orchestration: workflow state, human clarification, progress visibility, restart behavior, and a browser UI for interacting with the workflow.

TL;DR

The full source code is available on GitHub: temporal-pydantic-ai-demo. Enjoy!


When building agentic systems, most discussions focus on familiar concepts.

  • Prompts
  • Tools
  • Model selection

This is sound advice, but it often stops short when it comes to asking important questions that need to be addressed in production-ready systems.

What Happens If…

  • a user provides an incomplete request?
  • the workflow needs clarification?
  • a third-party service becomes unavailable?
  • a worker process crashes halfway through execution.
  • the user’s browser gets refreshed while a long-running task is still in progress?

Anyone who’s worked on production systems knows that a considerable amount of effort goes into handling failures, retries, recovery, and all the edge cases that appear once software leaves the development environment.

None of this is insurmountable, but it’s easy to end up going down the rabbit hole of rolling your own solutions, and you can quickly spend more time and effort on mitigating against potential failure points than advancing the core purpose of your proposed solution.


What is Durable Execution?

Durable Execution changes the way long-running processes are modelled, and it’s not a new idea.

A robot answering a question

Companies like Netflix employ Durable Execution to coordinate workflows and maintain reliable state across complex multi-cloud architectures, where failures, retries, and system interruptions are both inevitable and costly.

Instead of treating workflow state as something the application needs to persist and recover manually, the workflow itself becomes durable.

  • If a worker process crashes, the workflow doesn’t disappear
  • If the application is redeployed, the workflow doesn’t disappear
  • If the workflow needs to wait for minutes, hours, or even days for external input, it can do so without tying up application resources

The workflow becomes a long-lived unit of work rather than being intrinsically tied to the original request. In other words:

The request and the workflow don’t live and die together


Durable Execution and AI

For AI systems, this is particularly relevant.

We’ve all seen the toy AI examples that fit neatly inside a request-response cycle.

  1. 1A user submits a prompt
  2. 2The model responds
  3. 3The request completes

This rosy picture soon evaporates if you spend any time contemplating AI workflows that might survive in real production environments.

If you’re implementing AI-enabled workflows, not only do you have to consider the age-old risks that always existed, you likely have additional potential failure points:

  • More third-party APIs to rely on
  • Human approval as the norm, not the exception
  • Slower completion times

Why Temporal?

Several Durable Execution platforms exist today, but one of the best known is Temporal.

They’ve built a strong reputation for running business-critical workflows at scale, and it’s become the platform many professionals associate with Durable Execution. If you need proof of that, do a Google search on companies that are using it.

Alongside market recognition, the thing that draws me to Temporal is how much complexity is treated as a platform concern rather than something every application has to implement independently.

  • Workflow history
  • Recovery
  • Retries
  • Waiting
  • Resuming
  • Observability

Those concerns don’t disappear, but they move into infrastructure designed specifically to solve them.

However, that doesn’t mean you don’t have to think about these things. Temporal provides systematic ways to deal with them, and a backend to perform the heavy lifting, but it does require a different way of thinking to piece it all together.

Adopting Temporal requires a different way of thinking.

There’s definitely a learning curve, but personally, I think that’s a worthwhile tradeoff when you realistically consider the alternative of rolling all the code yourself and thoroughly testing it.

Of particular interest to me is how Temporal forces you to separate workflow orchestration from side-effecting work.

This may feel like a steeper learning curve early on, but now I find it actually helps me think about the process more clearly.


What About DBOS?

Another framework worthy of attention is DBOS, which approaches Durable Execution from a different angle by building on PostgreSQL.

For some projects, particularly smaller systems or teams that want a lighter operational footprint, that can be a very attractive model. So much so, I’ve even previously written about PostgreSQL: The Swiss Army Knife for Agentic Databases.

I don’t see Temporal and DBOS as direct competitors where one is universally better, as they solve similar classes of problems while making different tradeoffs.

At the moment, if somebody asked me which platform is most likely to appear in enterprise Durable Execution discussions, I’d still point to Temporal.

That said, I think DBOS is well worth considering for smaller projects.


Building a Durable Research Workflow

When I first started exploring Temporal, I found their documentation and tutorials to be extremely thorough. However, I noticed the coverage was a little light when it comes to several frameworks I’ve come to depend on.

In order to provide myself with a reference implementation I could consult moving forward, I built a small multi-agent research application consisting of:

  • Temporal
  • Pydantic AI
  • FastAPI
  • Next.js
  • Tavily
  • OpenAI

I achieved this by taking several of the Temporal tutorial applications, combining them with an example directly from the team at Pydantic AI, and adding my own touches where appropriate.

To be clear, the goal was never to build the ultimate research agent, but to demonstrate Durable Execution in a way that’s easy to see and easy to test.

The application allows a user to:

  • Start a research request
  • Answer clarification questions
  • Monitor workflow progress
  • Read the final report in the browser
  • Download the report as a PDF

The research workflow itself is intentionally simple with the focus being on what happens around it.

  • The workflow can pause while waiting for human input
  • The browser can be refreshed without losing the active workflow
  • The worker process can be stopped and restarted
  • The workflow state survives because Temporal owns the execution history rather than the application process

Where Pydantic AI Fits

The workflow uses multiple Pydantic AI agents responsible for tasks such as:

  • Query triage
  • Clarification generation
  • Search planning
  • Search execution
  • Report generation

Pydantic AI provides a clean way to structure the agent behaviour while Temporal provides the durable orchestration layer around it.

In my opinion, this separation of concerns works extremely well, as the agents focus on reasoning and task execution while Temporal focuses on workflow lifecycle, recovery, state management, retries, and long-running execution.

They’re solving different problems, but they complement each other naturally.


The Durability Test

The easiest way to understand Durable Execution is to break something.

  1. Start a workflow
  2. Allow it to begin processing
  3. Stop the worker process
  4. Restart the worker

The workflow continues from its recorded history rather than starting from the beginning, and that’s the behaviour I wanted this project to demonstrate.

Starting and stopping the worker is a simple way to simulate an interruption during normal operation, but of course, you can easily simulate API failures, etc. by directly amending the code.

The end result is recoverability being a normal part of operating the software, rather than an edge case.


Repository

The full source code is available on GitHub: temporal-pydantic-ai-demo

The repository README contains complete setup instructions, configuration details, API documentation, and implementation notes.


Final Thoughts

Having spent time with LangGraph, Pydantic AI, Airflow, Ray, and now Temporal in different contexts, it’s been useful to see where Durable Execution fits in the broader orchestration landscape.

  • Agent frameworks are important.
  • Workflow orchestration is important.
  • Durable Execution solves a different problem again.

For workflows that need to survive failures, wait for human input, expose progress, and continue running beyond the lifespan of a single request, I increasingly think Durable Execution deserves far more attention than it currently receives.

This project was my way of exploring that idea with Temporal and Pydantic AI.

Work with Ian

Need a workflow, pipeline, or copilot built for a real operational use case?

If this post aligns with what you are building, I can help scope the implementation and turn the concept into a production-ready system.