Tools · Apr 23, 2026

OpenAI describes WebSocket optimization for agent API performance

OpenAI's Responses API now supports WebSocket connections with connection-scoped caching to reduce latency in multi-turn agent workflows.

Trust79

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

OpenAI published technical guidance on using WebSockets in its Responses API to speed up agentic workflows
The approach involves connection-scoped caching to reduce API overhead and improve model latency
The post documents a case study of the Codex agent loop and how WebSocket optimization improved its performance

OpenAI published technical documentation on a WebSocket-based optimization for its Responses API, targeting developers who build multi-turn agent systems. The approach introduces connection-scoped caching, designed to reduce redundant API calls and improve latency in agent loops that make sequential requests to the model.

The company used the Codex agent loop as a case study to demonstrate the optimization's practical impact. While the full technical details were not accessible, the summary indicates the optimization targets a common performance bottleneck in agentic systems: the overhead of repeated API round-trips.

This addition to the Responses API reflects a broader trend of production infrastructure refinements for agent deployment, as developers move beyond single-turn interactions into complex, iterative workflows.

Sources

01OpenAI — News — Speeding up agentic workflows with WebSockets in the Responses API

Also on Tools

OpenAI describes WebSocket optimization for agent API performance

Sierra acquires YC-backed AI workflow startup Fragment

Gemma 4 Vision-Language Model Demo Runs on Edge Device With Local Audio and Webcam

NVIDIA Details Infrastructure Behind Latest OpenAI Models and Benchmarks