Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation
Guoli Jia, Junyao Hu, Xinwei Long, Kai Tian, Kaiyan Zhang, KaiKai Zhao, Ning Ding, and Bowen Zhou

TL;DR
Emotion-Director introduces a novel framework that effectively generates emotion-oriented images by integrating visual and textual prompts and employing multi-agent prompt rewriting, overcoming the semantic approximation issue in existing methods.
Contribution
The paper presents a cross-modal collaboration framework with MC-Diffusion and MC-Agent modules, enabling emotion-sensitive image generation beyond semantics.
Findings
Outperforms existing methods in emotion-oriented image generation.
Enhances sensitivity to emotions under the same semantics.
Demonstrates superior qualitative and quantitative results.
Abstract
Image generation based on diffusion models has demonstrated impressive capability, motivating exploration into diverse and specialized applications. Owing to the importance of emotion in advertising, emotion-oriented image generation has attracted increasing attention. However, current emotion-oriented methods suffer from an affective shortcut, where emotions are approximated to semantics. As evidenced by two decades of research, emotion is not equivalent to semantics. To this end, we propose Emotion-Director, a cross-modal collaboration framework consisting of two modules. First, we propose a cross-Modal Collaborative diffusion model, abbreviated as MC-Diffusion. MC-Diffusion integrates visual prompts with textual prompts for guidance, enabling the generation of emotion-oriented images beyond semantics. Further, we improve the DPO optimization by a negative visual prompt, enhancing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Multimodal Machine Learning Applications
