Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu,, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan

TL;DR
Buffer Anytime introduces a zero-shot framework for estimating depth and normal maps from video using only single-image priors and temporal constraints, eliminating the need for paired video training data.
Contribution
It leverages image priors with temporal consistency to achieve high-quality video buffer estimation without large-scale annotated video datasets.
Findings
Outperforms purely image-based methods in temporal consistency.
Achieves results comparable to state-of-the-art video models trained on large datasets.
Maintains high accuracy while eliminating the need for paired video data.
Abstract
We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video--depth and video--normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-shot training strategy combines state-of-the-art image estimation models based on optical flow smoothness through a hybrid loss function, implemented via a lightweight temporal attention architecture. Applied to leading image models like Depth Anything V2 and Marigold-E2E-FT, our approach significantly improves temporal consistency while maintaining accuracy. Experiments show that our method not only outperforms image-based approaches but also achieves results comparable to state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsSoftmax · Attention Is All You Need
