Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module
Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang,, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

TL;DR
This paper introduces the CH-SIMS v2.0 dataset and the AV-Mixup Consistent module to enhance multimodal sentiment analysis by emphasizing the importance of acoustic and visual cues, improving model awareness of non-verbal signals.
Contribution
The work presents a new, larger dataset with rich annotations and a novel mixup-based framework to better leverage non-verbal cues in multimodal sentiment analysis.
Findings
CH-SIMS v2.0 doubles the size of the original dataset with additional annotations.
AV-MC framework improves the model's ability to utilize non-verbal cues.
Enhanced interpretability and performance in multimodal sentiment prediction.
Abstract
Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Under such circumstances, in this work, we emphasize making non-verbal cues matter for the MSA task. Firstly, from the resource perspective, we present the CH-SIMS v2.0 dataset, an extension and enhancement of the CH-SIMS. Compared with the original dataset, the CH-SIMS v2.0 doubles its size with another 2121 refined video segments with both unimodal and multimodal annotations and collects 10161 unlabelled raw video segments with rich acoustic and visual emotion-bearing context to highlight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Advanced Text Analysis Techniques
MethodsAttentive Walk-Aggregating Graph Neural Network · Mixup
