MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition

Haoyang Zhang; Zhou Yang; Ke Sun; Yucai Pang; Guoliang Xu

arXiv:2510.24827·cs.CV·October 30, 2025

MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition

Haoyang Zhang, Zhou Yang, Ke Sun, Yucai Pang, Guoliang Xu

PDF

TL;DR

This paper introduces MCIHN, a hybrid neural network leveraging multi-path cross-modal interaction and adversarial autoencoders to improve multimodal emotion recognition accuracy.

Contribution

It proposes a novel hybrid model combining adversarial autoencoders, a cross-modal gate mechanism, and feature fusion for enhanced emotion recognition across modalities.

Findings

01

MCIHN outperforms existing methods on SIMS and MOSI datasets.

02

The model effectively reduces modality discrepancy and captures emotional relationships.

03

Experimental results demonstrate superior accuracy in multimodal emotion recognition.

Abstract

Multimodal emotion recognition is crucial for future human-computer interaction. However, accurate emotion recognition still faces significant challenges due to differences between different modalities and the difficulty of characterizing unimodal emotional information. To solve these problems, a hybrid network model based on multipath cross-modal interaction (MCIHN) is proposed. First, adversarial autoencoders (AAE) are constructed separately for each modality. The AAE learns discriminative emotion features and reconstructs the features through a decoder to obtain more discriminative information about the emotion classes. Then, the latent codes from the AAE of different modalities are fed into a predefined Cross-modal Gate Mechanism model (CGMM) to reduce the discrepancy between modalities, establish the emotional relationship between interacting modalities, and generate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.