MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization
Yingjie Xia, Xi Wang, Jinglei Shi, Vicky Kalogeiton, Jian Yang

TL;DR
MUSE is a unified framework for synthesizing and editing emotional content in images using test-time optimization, improving emotional accuracy and diversity without additional training or datasets.
Contribution
It introduces the first unified approach for emotional image generation and editing that leverages test-time optimization, avoiding extra datasets and model updates.
Findings
Outperforms existing methods in emotional accuracy
Enhances semantic diversity in generated images
Balances content fidelity and emotional expression
Abstract
Images evoke emotions that profoundly influence perception, often prioritized over content. Current Image Emotional Synthesis (IES) approaches artificially separate generation and editing tasks, creating inefficiencies and limiting applications where these tasks naturally intertwine, such as therapeutic interventions or storytelling. In this work, we introduce MUSE, the first unified framework capable of both emotional generation and editing. By adopting a strategy conceptually aligned with Test-Time Scaling (TTS) that widely used in both LLM and diffusion model communities, it avoids the requirement for additional updating diffusion model and specialized emotional synthesis datasets. More specifically, MUSE addresses three key questions in emotional synthesis: (1) HOW to stably guide synthesis by leveraging an off-the-shelf emotion classifier with gradient-based optimization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Emotion and Mood Recognition
