Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration

Yingying Deng; Xiangyu He; Fan Tang; Weiming Dong; Xucheng Yin

arXiv:2601.06605·cs.CV·January 13, 2026

Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration

Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong, Xucheng Yin

PDF

Open Access

TL;DR

Sissi introduces a training-free, semantic-style integration method for zero-shot style-guided image synthesis that balances content and style fidelity using multimodal attention and dynamic reweighting.

Contribution

The paper presents a novel in-context learning framework for style-guided image synthesis that avoids retraining and improves style-content balance through dynamic attention reweighting.

Findings

01

Achieves high-fidelity stylization with balanced semantic and style adherence.

02

Outperforms prior methods in visual quality and coherence.

03

Operates without task-specific retraining or expensive inversion procedures.

Abstract

Text-guided image generation has advanced rapidly with large-scale diffusion models, yet achieving precise stylization with visual exemplars remains difficult. Existing approaches often depend on task-specific retraining or expensive inversion procedures, which can compromise content integrity, reduce style fidelity, and lead to an unsatisfactory trade-off between semantic prompt adherence and style alignment. In this work, we introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task. Guided by textual semantic prompts, our method concatenates a reference style image with a masked target image, leveraging a pretrained ReFlow-based inpainting model to seamlessly integrate semantic content with the desired style through multimodal attention fusion. We further analyze the imbalance and noise sensitivity inherent in multimodal attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis