Mimic Intent, Not Just Trajectories

Renming Huang; Chendong Zeng; Wenjing Tang; Jintian Cai; Cewu Lu; Panpan Cai

arXiv:2602.08602·cs.RO·March 31, 2026

Mimic Intent, Not Just Trajectories

Renming Huang, Chendong Zeng, Wenjing Tang, Jintian Cai, Cewu Lu, Panpan Cai

PDF

1 Repo 2 Models

TL;DR

This paper introduces MINT, a novel imitation learning approach that disentangles intent from execution using multi-scale spectral tokenization, enabling better transfer, generalization, and efficiency in manipulation tasks.

Contribution

MINT employs multi-scale frequency-space tokenization to explicitly separate intent from execution, improving transferability and generalization in imitation learning.

Findings

01

Achieves state-of-the-art success rates on manipulation benchmarks.

02

Enables effective one-shot skill transfer by injecting intent tokens.

03

Demonstrates robustness and efficiency in real robot experiments.

Abstract

While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art approaches like Vision-Language-Action (VLA) models still struggle with adaptation to environmental changes and skill transfer. We argue this stems from mimicking raw trajectories without understanding the underlying intent. To address this, we propose explicitly disentangling behavior intent from execution details in end-2-end IL: Mimic Intent, Not just Trajectories(MINT). We achieve this via multi-scale frequency-space tokenization, which enforces a spectral decomposition of action chunk representation. We learn action tokens with a multi-scale coarse-to-fine structure, and force the coarsest token to capture low-frequency global structure and finer tokens to encode high-frequency details. This yields an abstract Intent token that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

renming-huang/MINT
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.