Hallucination Early Detection in Diffusion Models
Federico Betti, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe

TL;DR
HEaD+ is a novel early detection framework for diffusion models that improves object completeness and reduces generation time by assessing outputs during the process, using cross-attention and textual cues.
Contribution
The paper introduces HEaD+, a new method for early hallucination detection in diffusion models, utilizing a novel input and training on a large dataset to enhance image accuracy and efficiency.
Findings
HEaD+ increases the likelihood of complete object generation by 6-8%.
HEaD+ reduces generation times by up to 32%.
The approach improves object and relation accuracy in generated images.
Abstract
Text-to-Image generation has seen significant advancements in output realism with the advent of diffusion models. However, diffusion models encounter difficulties when tasked with generating multiple objects, frequently resulting in hallucinations where certain entities are omitted. While existing solutions typically focus on optimizing latent representations within diffusion models, the relevance of the initial generation seed is typically underestimated. While using various seeds in multiple iterations can improve results, this method also significantly increases time and energy costs. To address this challenge, we introduce HEaD+ (Hallucination Early Detection +), a novel approach designed to identify incorrect generations early in the diffusion process. The HEaD+ framework integrates cross-attention maps and textual information with a novel input, the Predicted Final Image. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
