TL;DR
CubeMLP is a novel MLP-based framework for multimodal sentiment analysis and depression estimation that effectively mixes features across modalities, achieving state-of-the-art results with lower computational cost.
Contribution
This paper introduces CubeMLP, a purely MLP-based multimodal feature processing framework that uses a novel feature-mixing approach across three axes for improved mental state prediction.
Findings
Achieves state-of-the-art performance on CMU-MOSI and CMU-MOSEI datasets.
Demonstrates lower computational cost compared to existing methods.
Effective multimodal feature mixing improves prediction accuracy.
Abstract
Multimodal sentiment analysis and depression estimation are two important research topics that aim to predict human mental states using multimodal data. Previous research has focused on developing effective fusion strategies for exchanging and integrating mind-related information from different modalities. Some MLP-based techniques have recently achieved considerable success in a variety of computer vision tasks. Inspired by this, we explore multimodal approaches with a feature-mixing perspective in this study. To this end, we introduce CubeMLP, a multimodal feature processing framework based entirely on MLP. CubeMLP consists of three independent MLP units, each of which has two affine transformations. CubeMLP accepts all relevant modality features as input and mixes them across three axes. After extracting the characteristics using CubeMLP, the mixed multimodal features are flattened…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
