FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation

Ruiteng Zhao; Wenshuo Wang; Yicheng Ma; Xiaocong Li; Francis E.H. Tay; Marcelo H. Ang Jr.; Haiyue Zhu

arXiv:2602.02142·cs.RO·March 23, 2026

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation

Ruiteng Zhao, Wenshuo Wang, Yicheng Ma, Xiaocong Li, Francis E.H. Tay, Marcelo H. Ang Jr., Haiyue Zhu

PDF

Open Access

TL;DR

FD-VLA introduces a novel framework that integrates force awareness into vision-language-action models for contact-rich manipulation without physical force sensors, improving robustness and reducing hardware costs.

Contribution

The paper proposes a force distillation module that enables force-aware reasoning in VLA models without relying on physical force sensors, enhancing practicality and robustness.

Findings

01

Distilled force token outperforms direct force sensor measurements.

02

The approach reduces hardware costs by eliminating the need for physical force sensors.

03

Enhanced perception-action robustness in contact-rich tasks.

Abstract

Force sensing is a crucial modality for Vision-Language-Action (VLA) frameworks, as it enables fine-grained perception and dexterous manipulation in contact-rich tasks. We present Force-Distilled VLA (FD-VLA), a novel framework that integrates force awareness into contact-rich manipulation without relying on physical force sensors. The core of our approach is a Force Distillation Module (FDM), which distills force by mapping a learnable query token, conditioned on visual observations and robot states, into a predicted force token aligned with the latent representation of actual force signals. During inference, this distilled force token is injected into the pretrained VLM, enabling force-aware reasoning while preserving the integrity of its vision-language semantics. This design provides two key benefits: first, it allows practical deployment across a wide range of robots that lack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Sensor and Energy Harvesting Materials · Multimodal Machine Learning Applications