UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

Jingwei Yang; Ruoxi Wu; Wei Shen; Meng Li; Yulong Liu; Huimin She; Lunxi Yuan

arXiv:2604.17850·cs.CV·April 21, 2026

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

Jingwei Yang, Ruoxi Wu, Wei Shen, Meng Li, Yulong Liu, Huimin She, Lunxi Yuan

PDF

TL;DR

UniCSG introduces a staged training framework for high-fidelity style transfer that effectively disentangles content and style, improving robustness and content preservation in diffusion-based models.

Contribution

It proposes a novel staged training approach with frequency-aware disentanglement and pixel-space reward learning for better style transfer performance.

Findings

01

Enhanced content faithfulness and style alignment demonstrated in experiments.

02

Improved robustness and stability in style-driven generation.

03

Effective disentanglement of content and style in diffusion models.

Abstract

Style transfer must match a target style while preserving content semantics. DiT-based diffusion models often suffer from content-style entanglement, leading to reference-content leakage and unstable generation. We present UniCSG, a unified framework for content-constrained, style-driven generation in both text-guided and reference-guided settings. UniCSG employs staged training: (i) a latent-space semantic disentanglement stage that combines low-frequency preprocessing with conditioning corruption to encourage content-style separation, and (ii) a latent-space frequency-aware detail reconstruction stage that refines details via multi-scale frequency supervision. We further incorporate pixel-space reward learning to align latent objectives with perceptual quality after decoding. Experiments demonstrate improved content faithfulness, style alignment, and robustness in both settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.