CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation
ZhenQi Chen, TsaiChing Ni, YuanFu Yang

TL;DR
CritiFusion is an inference-time framework that enhances text-to-image diffusion models by integrating semantic critique and spectral refinement, significantly improving alignment, detail, and realism without additional training.
Contribution
We introduce CritiFusion, a novel plug-in method combining semantic critique and spectral alignment to improve text-to-image generation fidelity and detail without retraining models.
Findings
Improves human-aligned metrics of text-image correspondence
Enhances visual quality and realism of generated images
Achieves state-of-the-art performance on benchmark evaluations
Abstract
Recent text-to-image diffusion models have achieved remarkable visual fidelity but often struggle with semantic alignment to complex prompts. We introduce CritiFusion, a novel inference-time framework that integrates a multimodal semantic critique mechanism with frequency-domain refinement to improve text-to-image consistency and detail. The proposed CritiCore module leverages a vision-language model and multiple large language models to enrich the prompt context and produce high-level semantic feedback, guiding the diffusion process to better align generated content with the prompt's intent. Additionally, SpecFusion merges intermediate generation states in the spectral domain, injecting coarse structural information while preserving high-frequency details. No additional model training is required. CritiFusion serves as a plug-in refinement stage compatible with existing diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Aesthetic Perception and Analysis
