End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin, Ying Zeng, Sijie Mai, Haifeng Hu

TL;DR
SemanticMAC is an end-to-end multimodal affective computing framework that leverages semantic-centric learning and pre-trained transformers to improve human-spoken video emotion recognition, outperforming existing methods.
Contribution
It introduces a novel semantic-centric approach with a specialized architecture, including the Affective Perceiver and multi-task learning, for improved multimodal affective computing.
Findings
Outperforms state-of-the-art on 7 datasets
Effective in multiple MAC downstream tasks
Addresses semantic imbalance and mismatch issues
Abstract
In the pathway toward Artificial General Intelligence (AGI), understanding human's affection is essential to enhance machine's cognition abilities. For achieving more sensual human-AI interaction, Multimodal Affective Computing (MAC) in human-spoken videos has attracted increasing attention. However, previous methods are mainly devoted to designing multimodal fusion algorithms, suffering from two issues: semantic imbalance caused by diverse pre-processing operations and semantic mismatch raised by inconsistent affection content contained in different modalities comparing with the multimodal ground truth. Besides, the usage of manual features extractors make they fail in building end-to-end pipeline for multiple MAC downstream tasks. To address above challenges, we propose a novel end-to-end framework named SemanticMAC to compute multimodal semantic-centric affection for human-spoken…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
