Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
Kwanyoung Kim, Sanghyun Kim

TL;DR
This paper introduces ANSE, a Bayesian active noise selection framework that leverages attention-based uncertainty to choose high-quality seeds in video diffusion, improving video quality and coherence.
Contribution
It proposes a model-aware, attention-based uncertainty measure for seed selection in video diffusion, with an efficient approximation for inference-time deployment.
Findings
Enhanced video quality and temporal coherence in experiments.
Marginal inference overhead compared to existing methods.
A generalizable approach applicable across diverse video diffusion models.
Abstract
The choice of initial noise strongly affects quality and prompt alignment in video diffusion; different seeds for the same prompt can yield drastically different results. While recent methods use externally designed priors (e.g., frequency filtering or inter-frame smoothing), they often overlook internal model signals that indicate inherently preferable seeds. To address this, we propose ANSE (Active Noise Selection for Generation), a model-aware framework that selects high-quality seeds by quantifying attention-based uncertainty. At its core is BANSA (Bayesian Active Noise Selection via Attention), an acquisition function that measures entropy disagreement across multiple stochastic attention samples to estimate model confidence and consistency. For efficient inference-time deployment, we introduce a Bernoulli-masked approximation of BANSA that estimates scores from a single diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
