Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer
Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng,, Xiaojiang Peng, Xuelong Li

TL;DR
This paper introduces a novel single-stage emotion recognition method using a Decoupled Subject-Context Transformer that jointly localizes subjects and classifies emotions, outperforming traditional two-stage approaches.
Contribution
The paper proposes a unified framework with a decoupled transformer that enhances interaction between subject and context cues for emotion recognition, reducing complexity and improving accuracy.
Findings
Achieves 3.39% higher accuracy on CAER-S dataset.
Attains 6.46% higher average precision on EMOTIC dataset.
Uses fewer parameters than two-stage methods.
Abstract
Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint training stages and limited interaction between fine-grained subject-context elements. To address the challenge, we present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT), for simultaneous subject localization and emotion classification. Rather than compartmentalizing training stages, we jointly leverage box and emotion signals as supervision to enrich subject-centric feature learning. Furthermore, we introduce DSCT to facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing
