Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
Brian Chao, Lior Yariv, Howard Xiao, Gordon Wetzstein

TL;DR
This paper introduces a foveated diffusion model that leverages human visual acuity patterns to generate high-resolution images and videos efficiently by allocating computational resources non-uniformly based on gaze location.
Contribution
It proposes a novel mixed-resolution token approach for diffusion models, enabling perceptually indistinguishable high-quality generation with reduced computational complexity.
Findings
Significant reduction in token count and generation time.
Perceptually indistinguishable results compared to full-resolution models.
Validated effectiveness through user studies and analysis.
Abstract
Diffusion and flow matching models have unlocked unprecedented capabilities for creative content creation, such as interactive image and streaming video generation. The growing demand for higher resolutions, frame rates, and context lengths, however, makes efficient generation increasingly challenging, as computational complexity grows quadratically with the number of generated tokens. Our work seeks to optimize the efficiency of the generation process in settings where the user's gaze location is known or can be estimated, for example, by using eye tracking. In these settings, we leverage the eccentricity-dependent acuity of human vision: while a user perceives very high-resolution visual information in a small region around their gaze location (the foveal region), the ability to resolve detail quickly degrades in the periphery of the visual field. Our approach starts with a mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Face Recognition and Perception
