What durable execution is, and why it is moving into Postgres
In June 2026, Microsoft open-sourced pg_durable, an extension that runs durable workflows natively inside PostgreSQL, and PostgreSQL 19 hit beta. "Durable execution" sounds like a buzzword, but it names a problem every backend engineer eventually hits and usually solves badly: what happens to a multi-step process when the server crashes halfway through?
You have almost certainly written the buggy version of this. The idea behind durable execution is the principled fix, and it is simple enough to build a toy of in a few lines.
The one idea: a function that survives a crash
Picture an order workflow:
def fulfill(order):
charge_card(order) # step 1
reserve_inventory(order) # step 2
send_email(order) # step 3
The process crashes after charge_card but before reserve_inventory. Now you have charged a customer and reserved nothing. Restart the program and it... starts from the top, charging them again. Every backend reinvents a tangle of status flags and "did I already do this?" checks to cope.
Durable execution makes that workflow resumable: if it crashes after step 1, it restarts and continues at step 2, never re-running step 1. The function behaves as if it never stopped, even though the machine running it died and came back.
How it works: record each step, then replay
The trick is to treat the workflow as a sequence of steps and persist the result of each step as it completes, in a log. On restart, you replay the function from the beginning, but each step first checks the log: if its result is already recorded, return that instead of doing the work again.
Here is the whole idea, with the log in a dict (a real system puts it in a database, which is the Postgres part):
class Workflow:
def __init__(self, log):
self.log = log # persisted: step_name -> result
def step(self, name, fn):
if name in self.log:
return self.log[name] # already done: replay from the log
result = fn() # do the work for real
self.log[name] = result # persist BEFORE moving on
save(self.log) # <-- the durability point
return result
Now the workflow is written in terms of step:
def fulfill(wf, order):
wf.step("charge", lambda: charge_card(order))
wf.step("reserve", lambda: reserve_inventory(order))
wf.step("email", lambda: send_email(order))
Crash after "charge"? On restart you reload the log, call fulfill again, and step("charge", ...) sees "charge" already in the log and skips the actual charge, returning the saved result. Execution effectively resumes at "reserve". The customer is charged exactly once.
Three details that matter, and they are the whole discipline:
- Persist the result before continuing. The
save()after recording is the durability point. If you crash during a step, it isn't in the log yet, so it re-runs, which is why steps must be safe to retry. - Steps should be idempotent or guarded. Replay means a step might run again if it crashed mid-execution. "Charge the card" must be written so a retry doesn't double-charge (an idempotency key), exactly the discipline durable execution forces you into.
- The function is the source of truth, the log is the memory. You write normal top-to-bottom code; the log makes it resumable. No state machine by hand.
Why put it in the database
If the log is what makes execution durable, it needs to live somewhere that survives a crash, which means a database. pg_durable makes the obvious leap: if the durability lives in Postgres anyway, run the whole workflow engine inside Postgres. The steps, their results, and the workflow state are all rows in tables, updated transactionally.
That buys something subtle: the step result and your business data can be committed in the same transaction. "Reserve inventory" and "record that the reserve step finished" become atomic, so you can never end up having done the work but lost the record of it, the exact failure mode that makes the hand-rolled version so bug-prone. The database's transactional guarantees become your workflow's guarantees.
(That pg_durable's worker is built on Rust libraries is a sign of the times too, the trend of writing the performance-critical core of database tooling in Rust, same as we have seen across the ecosystem.)
Why this matters beyond the buzzword
Durable execution is one of those ideas that, once you see it, you recognize the hand-rolled version everywhere: the status column with values like charged, reserved, emailed; the cron job that scans for "stuck" orders; the careful "have I already sent this?" checks. All of that is a workflow engine, badly reimplemented. Naming the pattern, and pushing it into infrastructure that already has durability and transactions, replaces a recurring source of data-integrity bugs with something principled.
And the mechanism is not magic: a log of completed steps, persisted before moving on, replayed on restart. Build the toy above and the production systems, Temporal, durable functions, now pg_durable, stop being black boxes and become "that pattern, scaled up." Building the engine instead of importing it is the whole point of the data engineering track.