Two in One Go: Single-stage Emotion Recognition with Decoupled   Subject-context Transformer

Xinpeng Li; Teng Wang; Jian Zhao; Shuyi Mao; Jinbao Wang; Feng Zheng,; Xiaojiang Peng; Xuelong Li

arXiv:2404.17205·cs.CV·April 30, 2024

Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng,, Xiaojiang Peng, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces a novel single-stage emotion recognition method using a Decoupled Subject-Context Transformer that jointly localizes subjects and classifies emotions, outperforming traditional two-stage approaches.

Contribution

The paper proposes a unified framework with a decoupled transformer that enhances interaction between subject and context cues for emotion recognition, reducing complexity and improving accuracy.

Findings

01

Achieves 3.39% higher accuracy on CAER-S dataset.

02

Attains 6.46% higher average precision on EMOTIC dataset.

03

Uses fewer parameters than two-stage methods.

Abstract

Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint training stages and limited interaction between fine-grained subject-context elements. To address the challenge, we present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT), for simultaneous subject localization and emotion classification. Rather than compartmentalizing training stages, we jointly leverage box and emotion signals as supervision to enrich subject-centric feature learning. Furthermore, we introduce DSCT to facilitate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing