ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

Huanzhen Wang; Ziheng Zhou; Jiaqi Song; Li He; Yunshi Lan; Yan Wang; Wenqiang Zhang

arXiv:2604.12255·cs.CV·April 15, 2026

ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

Huanzhen Wang, Ziheng Zhou, Jiaqi Song, Li He, Yunshi Lan, Yan Wang, Wenqiang Zhang

PDF

TL;DR

ARGen is a novel affect-reinforced generative framework that improves dynamic emotion perception in videos by synthesizing realistic facial expressions using affective priors and reinforcement learning.

Contribution

It introduces a two-stage framework combining affective semantic injection and adaptive reinforcement diffusion for data augmentation in emotion recognition.

Findings

01

ARGen significantly improves synthesis quality and recognition accuracy.

02

The framework effectively injects emotional priors into expression generation.

03

Experiments demonstrate enhanced robustness in dynamic emotion perception.

Abstract

Dynamic facial expression recognition in the wild remains challenging due to data scarcity and long-tail distributions, which hinder models from effectively learning the temporal dynamics of scarce emotions. To address these limitations, we propose ARGen, an Affect-Reinforced Generative Augmentation Framework that enables data-adaptive dynamic expression generation for robust emotion perception. ARGen operates in two stages: Affective Semantic Injection (ASI) and Adaptive Reinforcement Diffusion (ARD). The ASI stage establishes affective knowledge alignment through facial Action Units and employs a retrieval-augmented prompt generation strategy to synthesize consistent and fine-grained affective descriptions via large-scale visual-language models, thereby injecting interpretable emotional priors into the generation process. The ARD stage integrates text-conditioned image-to-video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.