Multimodal Channel-Mixing: Channel and Spatial Masked AutoEncoder on Facial Action Unit Detection
Xiang Zhang, Huiyuan Yang, Taoyue Wang, Xiaotian Li, Lijun Yin

TL;DR
This paper introduces Multimodal Channel-Mixing, a novel early fusion masked autoencoder for facial Action Unit detection that improves multi-modal feature learning and surpasses existing methods.
Contribution
The paper proposes a new multi-modal reconstruction network with channel-mixing and masked autoencoding for robust AU detection, emphasizing early fusion and multi-modal learning.
Findings
Outperforms state-of-the-art baseline methods
Effective in learning robust multi-modal representations
Reduces channel redundancy and enhances fusion capabilities
Abstract
Recent studies have focused on utilizing multi-modal data to develop robust models for facial Action Unit (AU) detection. However, the heterogeneity of multi-modal data poses challenges in learning effective representations. One such challenge is extracting relevant features from multiple modalities using a single feature extractor. Moreover, previous studies have not fully explored the potential of multi-modal fusion strategies. In contrast to the extensive work on late fusion, there are limited investigations on early fusion for channel information exploration. This paper presents a novel multi-modal reconstruction network, named Multimodal Channel-Mixing (MCM), as a pre-trained model to learn robust representation for facilitating multi-modal fusion. The approach follows an early fusion setup, integrating a Channel-Mixing module, where two out of five channels are randomly dropped.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Multimodal Channel-Mixing: Channel and Spatial Masked AutoEncoder on Facial Action Unit Detection· youtube
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Anomaly Detection Techniques and Applications
