GazeFusion: Saliency-Guided Image Generation
Yunxiang Zhang, Nan Wu, Connor Z. Lin, Gordon Wetzstein, Qi Sun

TL;DR
GazeFusion introduces a saliency-guided diffusion framework that enables control over viewer attention in generated images, aligning visual focus with user-specified attention distributions.
Contribution
It is the first to incorporate human visual attention priors into diffusion-based image generation for explicit attention control.
Findings
Attention-guided images match desired gaze distributions.
Eye-tracked studies confirm alignment with user intentions.
Saliency models accurately predict viewer attention in generated images.
Abstract
Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process. Given a user-specified viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers' attention toward the desired regions. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Virtual Reality Applications and Impacts
MethodsDiffusion · ALIGN
