TL;DR
This paper introduces Continual Harness, an online self-improving framework for embodied agents that refines prompts and strategies during a single run, enabling long-horizon decision-making without resets.
Contribution
It formalizes and demonstrates a reset-free, self-improving harness for embodied agents that adapts online and reduces reliance on human intervention.
Findings
Achieved complete Pokemon Blue, Yellow, and Crystal games without a lost battle.
Significantly reduced button-press costs compared to minimal baselines.
Enabled continuous in-game progress through online model self-improvement.
Abstract
Coding harnesses such as Claude Code and OpenHands wrap foundation models with tools, memory, and planning, but no equivalent exists for embodied agents' long-horizon partial-observability decision-making. We first report our Gemini Plays Pokemon (GPP) experiments. With iterative human-in-the-loop harness refinement, GPP became the first AI system to complete Pokemon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle. In the hardest stages, the agent itself began iterating on its strategy through long-context memory, surfacing emergent self-improvement signals alongside human-in-the-loop refinement. Continual Harness removes the human fully from this loop: a reset-free self-improving harness for embodied agents that formalizes and automates what we observed. Starting from only a minimal environment interface, the agent alternates between acting and refining its own…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
