Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation

Xuecheng Li; Weikuan Jia; Alisher Kurbonaliev; Qurbonaliev Alisher; Khudzhamkulov Rustam; Ismoilov Shuhratjon; Eshmatov Javhariddin; Yuanjie Zheng

arXiv:2512.07568·cs.CV·December 9, 2025

Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation

Xuecheng Li, Weikuan Jia, Alisher Kurbonaliev, Qurbonaliev Alisher, Khudzhamkulov Rustam, Ismoilov Shuhratjon, Eshmatov Javhariddin, Yuanjie Zheng

PDF

Open Access

TL;DR

This paper introduces DSRSD-Net, a dual-stream framework that disentangles and decorrelates shared and private modality features to improve cross-modal learning, robustness, and interpretability.

Contribution

The paper proposes a novel dual-stream residual semantic decorrelation network that explicitly separates and aligns shared and private features across modalities, addressing redundancy and dominance issues.

Findings

01

Improves prediction accuracy on educational benchmarks.

02

Effectively disentangles modality-specific and shared information.

03

Reduces cross-modal redundancy and enhances robustness.

Abstract

Cross-modal learning has become a fundamental paradigm for integrating heterogeneous information sources such as images, text, and structured attributes. However, multimodal representations often suffer from modality dominance, redundant information coupling, and spurious cross-modal correlations, leading to suboptimal generalization and limited interpretability. In particular, high-variance modalities tend to overshadow weaker but semantically important signals, while na\"ive fusion strategies entangle modality-shared and modality-specific factors in an uncontrolled manner. This makes it difficult to understand which modality actually drives a prediction and to maintain robustness when some modalities are noisy or missing. To address these challenges, we propose a Dual-Stream Residual Semantic Decorrelation Network (DSRSD-Net), a simple yet effective framework that disentangles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques