Skip to content
Research · Apr 24, 2026

New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks

COSPLAY co-evolves decision-making and skill discovery agents, showing 25% reward improvements on single-player benchmarks with an 8B model.

Trust69
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Researchers presented COSPLAY, a framework where an LLM decision agent retrieves skills from a learnable skill bank while a parallel agent extracts reusable skills from unlabeled rollouts.
  • Experiments across six game environments showed the 8B-parameter base model achieved over 25.1% average reward improvement versus four frontier LLM baselines on single-player games.
  • The framework addresses a core limitation of LLMs in long-horizon reasoning: the inability to discover, retain, and reuse structured skills across multiple episodes.
  • COSPLAY remained competitive on multi-player social reasoning games, suggesting broad applicability beyond single-agent scenarios.

Researchers from multiple institutions have introduced COSPLAY, a co-evolutionary framework designed to improve LLM agent performance in long-horizon interactive environments. The system operates through dual mechanisms: an LLM decision agent that selects and chains skills, and a parallel skill-discovery pipeline that automatically extracts reusable action patterns from accumulated experience.

The core technical contribution addresses a known gap in LLM agent behavior—while these models can reason about individual steps effectively, they struggle to maintain coherent multi-step policies over extended episodes, particularly under delayed reward feedback and partial observability. COSPLAY solves this by maintaining an evolving skill bank that both agents learn from and contribute to during training.

The authors evaluated COSPLAY using an 8-parameter model across six game environments. On single-player benchmarks, the framework outperformed four frontier LLM baselines by an average of 25.1% in reward accumulation. Performance remained stable on multi-player social reasoning tasks, suggesting the approach generalizes beyond isolated decision-making scenarios.

The paper specifies that skills are extracted with formal 'contracts'—likely specifications of preconditions and effects—which allows for structured composition. This stands in contrast to unstructured prompt-based skill injection, potentially explaining the consistency gains observed across multiple environment types.

Sources
  1. 01arXiv cs.AICo-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.