Inference-Time Temporal Probability Smoothing for Stable Video Segmentation with SAM2 under Weak Prompts
Dawar Jyoti Deka

TL;DR
This paper introduces a lightweight, inference-time method to enhance temporal stability in SAM2-based video segmentation under weak prompts by using optical flow and uncertainty estimates, without retraining.
Contribution
It proposes a novel, model-agnostic temporal probability smoothing technique that improves stability of video segmentation predictions during inference.
Findings
Significant improvement in temporal stability metrics across diverse videos.
Maintains spatial accuracy while enhancing temporal coherence.
Method is suitable for real-time interactive applications.
Abstract
Interactive video segmentation models such as SAM2 have demonstrated strong generalization across diverse visual domains. However, under weak user supervision, for example, when sparse point prompts are provided on a single frame, their predictions often suffer from temporal instability, including flickering boundaries, object dropout, and inconsistent object extents across frames. These issues limit their reliability in downstream video understanding and control applications. In this paper, we propose an inference-time temporal probability smoothing method that improves the temporal stability of SAM2-based video segmentation without retraining or architectural modification. Our approach operates directly on per-frame segmentation probability maps and leverages optical-flow-based motion warping together with pixel-wise uncertainty estimates derived from segmentation entropy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
