DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Yifan Han; Zhongxi Chen; Yuxuan Zhao; Congsheng Xu; Yanming Shao; Yichuan Peng; Yao Mu; and Wenzhao Lian

arXiv:2603.09121·cs.RO·March 11, 2026

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, and Wenzhao Lian

PDF

Open Access

TL;DR

DexHiL is a novel human-in-the-loop framework for improving dexterous vision-language-action models in robotic manipulation, enabling real-time human corrections and significantly boosting task success rates.

Contribution

It introduces an integrated arm-hand human-in-the-loop system with intervention-aware data sampling for post-training enhancement of dexterous VLA models.

Findings

01

Achieves a 25% increase in success rates over baseline methods.

02

Demonstrates effective real-robot dexterous manipulation improvements.

03

Provides a lightweight teleoperation interface for human interventions.

Abstract

While Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities in robotic manipulation, deploying them on specific and complex downstream tasks still demands effective post-training. In parallel, Human-in-the-Loop (HiL) learning has proven to be a powerful mechanism for refining robot policies. However, extending this paradigm to dexterous manipulation remains challenging: multi-finger control is high-dimensional, contact-intensive, and exhibits execution distributions that differ markedly from standard arm motions, leaving existing dexterous VLA systems limited in reliability and adaptability. We present DexHiL, the first integrated arm-hand human-in-the-loop framework for dexterous VLA models, enabling coordinated interventions over the arm and the dexterous hand within a single system. DexHiL introduces an intervention-aware data sampling strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI