MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization

Yingjie Xia; Xi Wang; Jinglei Shi; Vicky Kalogeiton; Jian Yang

arXiv:2511.21051·cs.CV·November 27, 2025

MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization

Yingjie Xia, Xi Wang, Jinglei Shi, Vicky Kalogeiton, Jian Yang

PDF

Open Access

TL;DR

MUSE is a unified framework for synthesizing and editing emotional content in images using test-time optimization, improving emotional accuracy and diversity without additional training or datasets.

Contribution

It introduces the first unified approach for emotional image generation and editing that leverages test-time optimization, avoiding extra datasets and model updates.

Findings

01

Outperforms existing methods in emotional accuracy

02

Enhances semantic diversity in generated images

03

Balances content fidelity and emotional expression

Abstract

Images evoke emotions that profoundly influence perception, often prioritized over content. Current Image Emotional Synthesis (IES) approaches artificially separate generation and editing tasks, creating inefficiencies and limiting applications where these tasks naturally intertwine, such as therapeutic interventions or storytelling. In this work, we introduce MUSE, the first unified framework capable of both emotional generation and editing. By adopting a strategy conceptually aligned with Test-Time Scaling (TTS) that widely used in both LLM and diffusion model communities, it avoids the requirement for additional updating diffusion model and specialized emotional synthesis datasets. More specifically, MUSE addresses three key questions in emotional synthesis: (1) HOW to stably guide synthesis by leveraging an off-the-shelf emotion classifier with gradient-based optimization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Emotion and Mood Recognition