Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition
Zaitian Wang, Jian He, Yu Liang, Xiyuan Hu, Tianhao Peng, Kaixin Wang,, Jiakai Wang, Chenlong Zhang, Weili Zhang, Shuang Niu, Xiaoyang Xie

TL;DR
Milmer is a multimodal emotion recognition framework that combines facial expressions and EEG signals using transformer-based fusion and multiple instance learning to improve accuracy in human-computer interaction applications.
Contribution
This work introduces a novel multimodal framework with a transformer-based fusion and MIL approach for emotion recognition, enhancing feature extraction and temporal dynamics modeling.
Findings
Achieved 96.72% accuracy on DEAP dataset
Validated effectiveness of each module through ablation studies
Outperformed existing methods in multimodal emotion recognition
Abstract
Emotions play a crucial role in human behavior and decision-making, making emotion recognition a key area of interest in human-computer interaction (HCI). This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals, introducing a novel multimodal framework-Milmer. The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities. It consists of an EEG preprocessing module, a facial feature extraction and balancing module, and a cross-modal fusion module. To enhance visual feature extraction, we fine-tune a pre-trained Swin Transformer on emotion-related datasets. Additionally, a cross-attention mechanism is introduced to balance token representation across modalities, ensuring effective feature integration. A key innovation of this work is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Stochastic Depth · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
