Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Dayuan Fu; Yunze Wu; Xiaojie Cai; Lyumanshan Ye; Shijie Xia; Zhen Huang; Weiye Si; Tianze Xu; Jie Sun; Keyu Li; Mohan Jiang; Junfei Wang; Qishuo Hua; Pengrui Lu; Yang Xiao; Pengfei Liu

arXiv:2510.27630·cs.AI·November 4, 2025

Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Dayuan Fu, Yunze Wu, Xiaojie Cai, Lyumanshan Ye, Shijie Xia, Zhen Huang, Weiye Si, Tianze Xu, Jie Sun, Keyu Li, Mohan Jiang, Junfei Wang, Qishuo Hua, Pengrui Lu, Yang Xiao, Pengfei Liu

PDF

Open Access

TL;DR

This paper introduces Apollo, a novel human-in-the-loop sampling framework that enhances training of large language model agents on long-horizon, domain-specific tasks by integrating asynchronous human guidance and data filtering, leading to significant performance improvements.

Contribution

Apollo's lightweight, asynchronous human guidance approach enables efficient long-horizon task training with reduced annotation costs and improved data quality, addressing limitations of existing methods.

Findings

01

Apollo achieves over 50% improvement on InnovatorBench with GLM-4.5.

02

It sustains over 30 hours of human-agent interaction.

03

It outperforms variants without human interaction by 28%.

Abstract

Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them to succeed on long-horizon, domain-specialized tasks remains challenging. Current methods primarily fall into two categories. The first relies on dense human annotations through behavior cloning, which is prohibitively expensive for long-horizon tasks that can take days or months. The second depends on outcome-driven sampling, which often collapses due to the rarity of valid positive trajectories on domain-specialized tasks. We introduce Apollo, a sampling framework that integrates asynchronous human guidance with action-level data filtering. Instead of requiring annotators to shadow every step, Apollo allows them to intervene only when the agent drifts from a promising trajectory, by providing prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)