OpenAI describes WebSocket optimization for agent API performance
OpenAI's Responses API now supports WebSocket connections with connection-scoped caching to reduce latency in multi-turn agent workflows.
1 source · single source
- OpenAI published technical guidance on using WebSockets in its Responses API to speed up agentic workflows
- The approach involves connection-scoped caching to reduce API overhead and improve model latency
- The post documents a case study of the Codex agent loop and how WebSocket optimization improved its performance
OpenAI published technical documentation on a WebSocket-based optimization for its Responses API, targeting developers who build multi-turn agent systems. The approach introduces connection-scoped caching, designed to reduce redundant API calls and improve latency in agent loops that make sequential requests to the model.
The company used the Codex agent loop as a case study to demonstrate the optimization's practical impact. While the full technical details were not accessible, the summary indicates the optimization targets a common performance bottleneck in agentic systems: the overhead of repeated API round-trips.
This addition to the Responses API reflects a broader trend of production infrastructure refinements for agent deployment, as developers move beyond single-turn interactions into complex, iterative workflows.
- Apr 24, 2026 · TechCrunch
Sierra acquires YC-backed AI workflow startup Fragment
Trust54 - Apr 23, 2026 · Hugging Face
Gemma 4 Vision-Language Model Demo Runs on Edge Device With Local Audio and Webcam
Trust70 - Apr 22, 2026 · NVIDIA — Deep Learning Blog
NVIDIA Details Infrastructure Behind Latest OpenAI Models and Benchmarks
Trust52