ICLR: In-Context Imitation Learning with Visual Reasoning

Toan Nguyen; Weiduo Yuan; Songlin Wei; Hui Li; Daniel Seita; Yue Wang

arXiv:2603.07530·cs.RO·March 10, 2026

ICLR: In-Context Imitation Learning with Visual Reasoning

Toan Nguyen, Weiduo Yuan, Songlin Wei, Hui Li, Daniel Seita, Yue Wang

PDF

Open Access

TL;DR

ICLR introduces a novel in-context imitation learning framework that incorporates visual reasoning traces to improve robot task adaptation, success rates, and generalization in complex scenarios.

Contribution

The paper presents a unified transformer-based approach that jointly learns action prediction and visual reasoning traces, enhancing robotic imitation learning capabilities.

Findings

01

Improved success rates in manipulation tasks.

02

Enhanced generalization to unseen tasks and objects.

03

Effective integration of visual reasoning in imitation learning.

Abstract

In-context imitation learning enables robots to adapt to new tasks from a small number of demonstrations without additional training. However, existing approaches typically condition only on state-action trajectories and lack explicit representations of task intent. This limitation hinders performance in complex and ambiguous task settings where the same actions may be consistent with different objectives. To address this, we present In-Context Imitation Learning with Visual Reasoning (ICLR), a novel framework that augments demonstration prompts with structured visual reasoning traces representing anticipated future robot trajectories in image space. ICLR also jointly learns to generate reasoning traces and low-level actions within a unified autoregressive transformer, enabling the model to mimic not only action prediction but also the reasoning process that leads to those actions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics