Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup   Consistent Module

Yihe Liu; Ziqi Yuan; Huisheng Mao; Zhiyun Liang; Wanqiuyue Yang,; Yuanzhe Qiu; Tie Cheng; Xiaoteng Li; Hua Xu; Kai Gao

arXiv:2209.02604·cs.MM·September 7, 2022

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang,, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces the CH-SIMS v2.0 dataset and the AV-Mixup Consistent module to enhance multimodal sentiment analysis by emphasizing the importance of acoustic and visual cues, improving model awareness of non-verbal signals.

Contribution

The work presents a new, larger dataset with rich annotations and a novel mixup-based framework to better leverage non-verbal cues in multimodal sentiment analysis.

Findings

01

CH-SIMS v2.0 doubles the size of the original dataset with additional annotations.

02

AV-MC framework improves the model's ability to utilize non-verbal cues.

03

Enhanced interpretability and performance in multimodal sentiment prediction.

Abstract

Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Under such circumstances, in this work, we emphasize making non-verbal cues matter for the MSA task. Firstly, from the resource perspective, we present the CH-SIMS v2.0 dataset, an extension and enhancement of the CH-SIMS. Compared with the original dataset, the CH-SIMS v2.0 doubles its size with another 2121 refined video segments with both unimodal and multimodal annotations and collects 10161 unlabelled raw video segments with rich acoustic and visual emotion-bearing context to highlight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuiar/ch-sims-v2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Advanced Text Analysis Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network · Mixup