Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu; Zongxia Li; Guangyao Shi; Alexander Duffy; Tyler Marques; Matthew Lyle Olson; Tianyi Zhou; Dinesh Manocha

arXiv:2604.20987·cs.AI·April 24, 2026

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

PDF

1 Repo 1 Models 1 Datasets

TL;DR

COSPLAY is a co-evolution framework where an LLM decision agent retrieves skills from a learnable skill bank, improving long-horizon decision making in complex environments.

Contribution

It introduces a novel co-evolution approach for skill discovery and retrieval in LLM-based agents for long-horizon tasks.

Findings

01

Achieves over 25.1% average reward improvement on six game environments.

02

Outperforms four frontier LLM baselines on single-player game benchmarks.

03

Remains competitive on multi-player social reasoning games.

Abstract

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent skill usage in environments. Large Language Models (LLMs) offer a promising alternative as game playing agents, but they often struggle with consistent long horizon decision making because they lack a mechanism to discover, retain, and reuse structured skills across episodes. We present COSPLAY, a co evolution framework in which an LLM decision agent retrieves skills from a learnable skill bank to guide action taking, while an agent managed skill pipeline discovers reusable skills from the agents unlabeled rollouts to form a skill bank. Our framework improves both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wuxiyang1996/cos-play
github

Models

🤗
IntelligenceLab/COS-PLAY
model· ♡ 3
♡ 3

Datasets

IntelligenceLab/Cos-Play-Cold-Start
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.