SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation

Paul Grimal; Micha\"el Soumm; Herv\'e Le Borgne; Olivier Ferret; Akihiro Sugimoto

arXiv:2508.13866·cs.CV·January 21, 2026

SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation

Paul Grimal, Micha\"el Soumm, Herv\'e Le Borgne, Olivier Ferret, Akihiro Sugimoto

PDF

Open Access 1 Video

TL;DR

This paper introduces a training-free method for text-to-image generation that improves alignment with prompts by modeling signal components during denoising, enhancing fidelity and spatial accuracy.

Contribution

It presents a novel, training-free framework that explicitly models signal components for better prompt alignment in diffusion-based image generation.

Findings

01

Outperforms existing state-of-the-art methods in prompt fidelity

02

Supports additional conditioning modalities like bounding boxes

03

Seamlessly integrates with diffusion and flow matching architectures

Abstract

State-of-the-art text-to-image models produce visually impressive results but often struggle with precise alignment to text prompts, leading to missing critical elements or unintended blending of distinct concepts. We propose a novel approach that learns a high-success-rate distribution conditioned on a target prompt, ensuring that generated images faithfully reflect the corresponding prompts. Our method explicitly models the signal component during the denoising process, offering fine-grained control that mitigates over-optimization and out-of-distribution artifacts. Moreover, our framework is training-free and seamlessly integrates with both existing diffusion and flow matching architectures. It also supports additional conditioning modalities -- such as bounding boxes -- for enhanced spatial alignment. Extensive experiments demonstrate that our approach outperforms current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation· underline

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques