Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

Xin Niu; Enyi Li; Jinchao Liu; Yan Wang; Margarita Osadchy; Yongchun Fang

arXiv:2605.16887·cs.CV·May 19, 2026

Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNet

Xin Niu, Enyi Li, Jinchao Liu, Yan Wang, Margarita Osadchy, Yongchun Fang

PDF

TL;DR

This paper introduces cmUNet, a neural module for learning modality-agnostic representations that improve cross-modality recognition tasks by retaining identity information and enhancing robustness to occlusions.

Contribution

The authors propose a novel compact encoder-decoder neural module (cmUNet) combined with MarrNet for effective cross-modality recognition, outperforming existing methods.

Findings

01

cmUNet effectively learns modality-agnostic representations.

02

MarrNet achieves superior performance on five challenging tasks.

03

Robustness to occlusions correlates with better modality gap bridging.

Abstract

Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.