Continual Harness: Online Adaptation for Self-Improving Foundation Agents

Seth Karten; Joel Zhang; Tersoo Upaa Jr; Ruirong Feng; Wenzhe Li; Chengshuai Shi; Chi Jin; Kiran Vodrahalli

arXiv:2605.09998·cs.LG·May 12, 2026

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

Seth Karten, Joel Zhang, Tersoo Upaa Jr, Ruirong Feng, Wenzhe Li, Chengshuai Shi, Chi Jin, Kiran Vodrahalli

PDF

1 Repo

TL;DR

This paper introduces Continual Harness, an online self-improving framework for embodied agents that refines prompts and strategies during a single run, enabling long-horizon decision-making without resets.

Contribution

It formalizes and demonstrates a reset-free, self-improving harness for embodied agents that adapts online and reduces reliance on human intervention.

Findings

01

Achieved complete Pokemon Blue, Yellow, and Crystal games without a lost battle.

02

Significantly reduced button-press costs compared to minimal baselines.

03

Enabled continuous in-game progress through online model self-improvement.

Abstract

Coding harnesses such as Claude Code and OpenHands wrap foundation models with tools, memory, and planning, but no equivalent exists for embodied agents' long-horizon partial-observability decision-making. We first report our Gemini Plays Pokemon (GPP) experiments. With iterative human-in-the-loop harness refinement, GPP became the first AI system to complete Pokemon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle. In the hardest stages, the agent itself began iterating on its strategy through long-context memory, surfacing emergent self-improvement signals alongside human-in-the-loop refinement. Continual Harness removes the human fully from this loop: a reset-free self-improving harness for embodied agents that formalizes and automates what we observed. Starting from only a minimal environment interface, the agent alternates between acting and refining its own…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sethkarten/continual-harness
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.