Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta

TL;DR
This paper presents a universal stability adapter framework for frame-based video models, improving temporal consistency and robustness against corruptions without retraining the entire network.
Contribution
Introduces a general class of stability adapters and a resource-efficient training method applicable to various architectures for stable video inference.
Findings
Enhanced temporal stability across multiple vision tasks
Improved robustness to image corruptions like noise and weather effects
Preserved or improved prediction quality with adapters
Abstract
When applied sequentially to video, frame-based networks often exhibit temporal inconsistency - for example, outputs that flicker between frames. This problem is amplified when the network inputs contain time-varying corruptions. In this work, we introduce a general approach for adapting frame-based models for stable and robust inference on video. We describe a class of stability adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. By analyzing the theoretical properties of this loss, we identify the conditions where it produces well-behaved stabilizer training. Our experiments validate our approach on several vision tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image and Video Stabilization
