CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Keyu Wang; Bingchen Miao; Wendong Bu; Yu Wu; Juncheng Li; Shengyu Zhang; Wenqiao Zhang; Siliang Tang; Jun Xiao; Yueting Zhuang

arXiv:2601.02201·cs.LG·January 6, 2026

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Keyu Wang, Bingchen Miao, Wendong Bu, Yu Wu, Juncheng Li, Shengyu Zhang, Wenqiao Zhang, Siliang Tang, Jun Xiao, Yueting Zhuang

PDF

Open Access

TL;DR

CORE introduces a novel training framework for virtual agents that combines imitation and exploration by automatically inferring reward functions and expanding behavioral strategies, leading to improved performance and generalization.

Contribution

The paper proposes CORE, a code-based inverse self-training framework with graph expansion, integrating semantic code abstraction, strategy graph expansion, and trajectory-guided extrapolation to enhance behavioral diversity without manual reward design.

Findings

01

Significantly improves performance on Web and Android platforms.

02

Enhances behavioral diversity through strategy graph expansion.

03

Achieves better generalization in virtual agent training.

Abstract

The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challenges: Behavior Cloning is simple and effective through imitation but suffers from low behavioral diversity, while Reinforcement Learning is capable of discovering novel strategies through exploration but heavily relies on manually designed reward functions. To address the conflict between these two methods, we present CORE, a Code-based Inverse Self-Training Framework with Graph Expansion that bridges imitation and exploration, offering a novel training framework that promotes behavioral diversity while eliminating the reliance on manually reward design. Specifically, we introduce Semantic Code Abstraction to automatically infers reward functions from expert demonstrations without manual design. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks