Tactile Modality Fusion for Vision-Language-Action Models

Charlotte Morissette; Amin Abyaneh; Wei-Di Chang; Anas Houssaini; David Meger; Hsiu-Chin Lin; Jonathan Tremblay; Gregory Dudek

arXiv:2603.14604·cs.RO·March 17, 2026

Tactile Modality Fusion for Vision-Language-Action Models

Charlotte Morissette, Amin Abyaneh, Wei-Di Chang, Anas Houssaini, David Meger, Hsiu-Chin Lin, Jonathan Tremblay, Gregory Dudek

PDF

Open Access

TL;DR

This paper introduces TacFiLM, a lightweight method for integrating tactile signals into vision-language-action models, enhancing robot manipulation performance without significant computational overhead.

Contribution

The paper presents TacFiLM, a novel post-training finetuning approach that conditions visual features on tactile data using FiLM, improving contact-rich manipulation tasks.

Findings

01

Improved success rate in insertion tasks

02

Enhanced force stability during manipulation

03

Faster completion times across tasks

Abstract

We propose TacFiLM, a lightweight modality-fusion approach that integrates visual-tactile signals into vision-language-action (VLA) models. While recent advances in VLA models have introduced robot policies that are both generalizable and semantically grounded, these models mainly rely on vision-based perception. Vision alone, however, cannot capture the complex interaction dynamics that occur during contact-rich manipulation, including contact forces, surface friction, compliance, and shear. While recent attempts to integrate tactile signals into VLA models often increase complexity through token concatenation or large-scale pretraining, the heavy computational demands of behavioural models necessitate more lightweight fusion strategies. To address these challenges, TacFiLM outlines a post-training finetuning approach that conditions intermediate visual features on pretrained tactile…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Tactile and Sensory Interactions · Advanced Sensor and Energy Harvesting Materials