METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

Yankai Fu; Ning Chen; Junkai Zhao; Shaozhe Shan; Guocai Yao; Pengwei Wang; Zhongyuan Wang; Shanghang Zhang

arXiv:2511.17366·cs.RO·November 24, 2025

METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

Yankai Fu, Ning Chen, Junkai Zhao, Shaozhe Shan, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

PDF

Open Access

TL;DR

METIS is a multi-source egocentric training framework that leverages large-scale human and robotic data to develop a versatile vision-language-action model for dexterous manipulation, demonstrating high success and robustness in real-world tasks.

Contribution

The paper introduces METIS, a novel VLA model trained on integrated egocentric datasets with motion-aware supervision, advancing dexterous manipulation capabilities.

Findings

01

Achieves highest success rate in six real-world tasks

02

Demonstrates strong generalization to out-of-distribution scenarios

03

Provides a unified framework for reasoning and acting in dexterous manipulation

Abstract

Building a generalist robot that can perceive, reason, and act across diverse tasks remains an open challenge, especially for dexterous manipulation. A major bottleneck lies in the scarcity of large-scale, action-annotated data for dexterous skills, as teleoperation is difficult and costly. Human data, with its vast scale and diverse manipulation behaviors, provides rich priors for learning robotic actions. While prior works have explored leveraging human demonstrations, they are often constrained by limited scenarios and a large visual gap between human and robots. To eliminate these limitations, we propose METIS, a vision-language-action (VLA) model for dexterous manipulation pretrained on multi-source egocentric datasets. We first construct EgoAtlas, which integrates large-scale human and robotic data from multiple sources, all unified under a consistent action space. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics