EmoScene: A Dual-space Dataset for Controllable Affective Image Generation
Li He, Longtai Zhang, Wenqiang Zhang, Yan Wang, Lizhe Qi

TL;DR
EmoScene introduces a large-scale dataset combining affective and perceptual scene attributes to improve controllability in affective image generation using diffusion models.
Contribution
The paper presents EmoScene, a dual-space emotion dataset with annotations for affective and perceptual factors, and benchmarks affect control in diffusion-based image synthesis.
Findings
Discrete emotions map systematically within the VAD space.
Affect correlates with perceptual scene attributes.
Baseline model demonstrates controllability via dual-space supervision.
Abstract
Text-to-image diffusion models have achieved high visual fidelity, yet precise control over scene semantics and fine-grained affective tone remains challenging. Human visual affect arises from the rapid integration of contextual meaning, including valence, arousal, and dominance, with perceptual cues such as color harmony, luminance contrast, texture variation, curvature, and spatial layout. However, current text-to-image models rarely represent affective and perceptual factors within a unified representation, which limits their ability to synthesize scenes with coherent and nuanced emotional intent. To address this gap, we construct EmoScene, a large-scale dual-space emotion dataset that jointly encodes affective dimensions and perceptual attributes, with contextual semantics provided as supporting annotations. EmoScene contains 1.2M images across more than three hundred real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
