Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA

Tutian Tang; Xingyu Ji; Wanli Xing; Ce Hao; Wenqiang Xu; Lin Shao; Cewu Lu; Qiaojun Yu; Jiangmiao Pang; Kaifeng Zhang

arXiv:2603.08122·cs.RO·March 10, 2026

Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA

Tutian Tang, Xingyu Ji, Wanli Xing, Ce Hao, Wenqiang Xu, Lin Shao, Cewu Lu, Qiaojun Yu, Jiangmiao Pang, Kaifeng Zhang

PDF

Open Access

TL;DR

This paper introduces an integrated framework combining reinforcement learning-trained skills and a multimodal VLA architecture to enable human-like, contact-rich bimanual manipulation in robots, significantly improving success rates.

Contribution

It presents IMCopilot for data-efficient skill learning and MoDE-VLA for multimodal integration, advancing robotic dexterous manipulation capabilities.

Findings

01

Doubled success rate in complex contact-rich tasks

02

Effective multimodal sensory fusion without degrading pretrained knowledge

03

Enhanced teleoperation with reinforcement learning-based atomic skills

Abstract

While Vision-Language-Action (VLA) models have demonstrated remarkable success in robotic manipulation, their application has largely been confined to low-degree-of-freedom end-effectors performing simple, vision-guided pick-and-place tasks. Extending these models to human-like, bimanual dexterous manipulation-specifically contact-rich in-hand operations-introduces critical challenges in high-fidelity data acquisition, multi-skill learning, and multimodal sensory fusion. In this paper, we propose an integrated framework to address these bottlenecks, built upon two components. First, we introduce IMCopilot (In-hand Manipulation Copilot), a suite of reinforcement learning-trained atomic skills that plays a dual role: it acts as a shared-autonomy assistant to simplify teleoperation data collection, and it serves as a callable low-level execution primitive for the VLA. Second, we present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Multimodal Machine Learning Applications