OmniVaT: Single Domain Generalization for Multimodal Visual-Tactile Learning

Liuxiang Qiu; Hui Da; Yuzhen Niu; Tiesong Zhao; Yang Cao; Zheng-Jun Zha

arXiv:2601.00352·cs.CV·January 5, 2026

OmniVaT: Single Domain Generalization for Multimodal Visual-Tactile Learning

Liuxiang Qiu, Hui Da, Yuzhen Niu, Tiesong Zhao, Yang Cao, Zheng-Jun Zha

PDF

Open Access

TL;DR

OmniVaT introduces a novel framework for single domain generalization in multimodal visual-tactile learning, effectively bridging modality gaps and adapting to unseen domain shifts without multi-domain training.

Contribution

It proposes the first solution to SDG-VTL, integrating MFFA for modality alignment and DTG for domain adaptability, advancing multimodal VTL robustness.

Findings

01

Outperforms existing methods in cross-domain generalization

02

Effectively mitigates modality discrepancies without multi-domain data

03

Enhances adaptability to unseen domain shifts

Abstract

Visual-tactile learning (VTL) enables embodied agents to perceive the physical world by integrating visual (VIS) and tactile (TAC) sensors. However, VTL still suffers from modality discrepancies between VIS and TAC images, as well as domain gaps caused by non-standardized tactile sensors and inconsistent data collection procedures. We formulate these challenges as a new task, termed single domain generalization for multimodal VTL (SDG-VTL). In this paper, we propose an OmniVaT framework that, for the first time, successfully addresses this task. On the one hand, OmniVaT integrates a multimodal fractional Fourier adapter (MFFA) to map VIS and TAC embeddings into a unified embedding-frequency space, thereby effectively mitigating the modality gap without multi-domain training data or careful cross-modal fusion strategies. On the other hand, it also incorporates a discrete tree generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition