LC
Liam ChungApril 20, 2026 ยท 3 min read

Long-Running Agents: What Breaks First

Long-Running Agents: What Breaks First

Long-running agents usually fail because the workflow around the model is under-designed, not because the model suddenly became dumb. Once work spans more time, more tools, and more intermediate artifacts, system design becomes the main variable.

What usually breaks first

Context discipline breaks first. Then state management. Then retries, timeouts, and unclear review boundaries. The model can still be strong while the workflow falls apart.

That is why recent agent platform work focuses so much on files, sandboxes, execution environments, and observability rather than only raw model output quality.

What a better architecture looks like

A durable architecture keeps artifacts outside the prompt, defines step boundaries, makes retries explicit, and logs enough to reconstruct what happened. Human review should appear where risk changes, not in a random or all-or-nothing way.

The longer the workflow, the more important it is to turn hidden state into explicit state.

How to make them less fragile

Use staged work, durable artifacts, narrow permissions, and traces. Treat every long-running workflow like a system that will eventually fail and need explanation.

If you cannot explain a failure path, you are not ready to scale the workflow.

Quick decision table

SituationBetter default
Intermediate artifactsKeep them in files or structured outputs
Retry behaviorDesign idempotent steps and explicit boundaries
Risky side effectsAdd human approval gates
Slow debuggingInstrument traces before scale

Practical checklist

FAQ

Are long-running agents mostly about stronger models?

No. They are mostly about stronger execution, state, and review design.

Do I need observability this early?

Yes. Without traces, long-running failure looks random even when it is patterned.

Sources and further reading

๐Ÿ”— The next evolution of the Agents SDK
Official OpenAI update on the Agents SDK, sandbox execution, and model-native agent infrastructure.
๐Ÿ”— From model to agent: equipping the Responses API with a computer environment
Official engineering explanation of the Responses API computer environment and shell-based agent workflows.
๐Ÿ”— OpenAI Frontier: observability and governance for agents in production
Official OpenAI enterprise platform page highlighting observability and governance for deployed agents.

CLIP_BLOCK_clip_openai_agent_platform_20260420

Related reading

Use this inside Thinkly

If you want your AI research, comparisons, and workflow decisions to stay reusable, keep them in Thinkly instead of scattering them across chats and tabs.

th
Made with ThinklyCollect clips. Structure thinking. Share.
Try Thinkly โ†’