ASH: Agents that Self-Hone via Embodied Learning
Benjamin Schneider, Xavier Schneider, Victor Zhong, Sun Sun

TL;DR
ASH is a scalable agentic system that learns long-horizon embodied tasks from unlabeled internet videos without reward shaping, using self-improvement and unsupervised key moment identification.
Contribution
Introduces ASH, a novel self-improving embodied learning system that leverages unlabeled internet videos and inverse dynamics models to handle long-horizon tasks.
Findings
ASH outperforms baseline methods in Pokemon Emerald and Zelda, reaching more milestones.
ASH maintains progress over 8 hours, unlike baselines that plateau.
Self-improvement enables scalable long-horizon embodied learning.
Abstract
Long-horizon embodied tasks remain a fundamental challenge in AI, as current methods rely on hand-engineered rewards or action-labeled demonstrations, neither of which scales. We introduce ASH, an agentic system that learns an embodied policy from unlabeled, noisy internet video, without reward shaping or expert annotation. ASH follows a self-improvement loop; when it gets stuck, ASH learns an Inverse Dynamics Model (IDM) from its own trajectories, and uses its IDM to extract supervision from relevant internet video. ASH uses unsupervised learning to identify key moments from large-scale internet video and retains them as long-term memory -- allowing it to tackle long-horizon problems. We evaluate ASH on two complementary environments demanding multi-hour planning: Pokemon Emerald, a turn-based RPG, and The Legend of Zelda: The Minish Cap, a real-time action-adventure game. In both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
