Mutual Information-based Representations Disentanglement for Unaligned   Multimodal Language Sequences

Fan Qian; Jiqing Han; Jianchen Li; Yongjun He; Tieran Zheng; Guibin; Zheng

arXiv:2409.12408·cs.CL·September 20, 2024

Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences

Fan Qian, Jiqing Han, Jianchen Li, Yongjun He, Tieran Zheng, Guibin, Zheng

PDF

Open Access

TL;DR

This paper introduces a mutual information-based method for disentangling and integrating unaligned multimodal language sequences, reducing redundancy and improving model generalization by jointly learning modality-agnostic representations.

Contribution

It proposes a novel disentanglement framework that minimizes mutual information to eliminate redundancy and leverages unlabeled data to enhance performance and prevent overfitting.

Findings

01

Effective disentanglement of representations demonstrated on benchmark datasets.

02

Reduced information redundancy leads to better generalization.

03

Improved performance over existing methods in multimodal sequence tasks.

Abstract

The key challenge in unaligned multimodal language sequences lies in effectively integrating information from various modalities to obtain a refined multimodal joint representation. Recently, the disentangle and fuse methods have achieved the promising performance by explicitly learning modality-agnostic and modality-specific representations and then fusing them into a multimodal joint representation. However, these methods often independently learn modality-agnostic representations for each modality and utilize orthogonal constraints to reduce linear correlations between modality-agnostic and modality-specific representations, neglecting to eliminate their nonlinear correlations. As a result, the obtained multimodal joint representation usually suffers from information redundancy, leading to overfitting and poor generalization of the models. In this paper, we propose a Mutual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling