Swift Sampling: Selecting Temporal Surprises via Taylor Series
Dahye Kim, Bhuvan Sachdeva, Karan Uppal, Naman Gupta, Vineeth N. Balasubramanian, Deepti Ghadiyaram

TL;DR
Swift Sampling is a lightweight, training-free method that identifies and samples the most informative, surprising frames in long videos by modeling feature trajectories and deviations, improving downstream task performance.
Contribution
The paper introduces Swift Sampling, a novel, training-free frame selection algorithm based on Taylor series that efficiently detects temporal surprises in videos.
Findings
Outperforms uniform sampling and prior baselines across multiple benchmarks.
Adds minimal computational overhead, only 0.02x more than baseline.
Significantly improves accuracy in long videos with limited frame budgets.
Abstract
While most frames in long-form video are redundant, the critical information resides in temporal surprises: moments where the actual visual features deviate from their predicted evolution. Inspired by the human brain's predictive coding, we introduce Swift Sampling, an elegant, training-free frame selection algorithm that automatically identifies high-information moments in a video. Specifically, we model a video as a differentiable trajectory in the visual latent space and compute the velocity and acceleration of its features. Then, we apply Taylor expansion to project the expected path of subsequent frames. Frames that diverge sharply from this predicted manifold are identified as temporally surprising frames and selected for sampling. Unlike prior training-free methods that rely on auxiliary networks or video-specific hyperparameter tuning, Swift Sampling is incredibly lightweight,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
