TL;DR
This paper introduces a new benchmark and a robust video object segmentation method called MoGA, which significantly improves performance under real-world corruptions by handling object-specific degradation and maintaining temporal consistency.
Contribution
It provides the first comprehensive benchmark for robust PVOS and proposes MoGA, a novel method that leverages object memory to enhance robustness against diverse corruptions.
Findings
MoGA outperforms existing methods across various corruption types.
The benchmark includes 351 videos with over 2,500 object masks under adverse conditions.
Synthetic training data with diverse corruptions improves model robustness.
Abstract
The performance of promptable video object segmentation (PVOS) models substantially degrades under input corruptions, which prevents PVOS deployment in safety-critical domains. This paper offers the first comprehensive study on robust PVOS (RobustPVOS). We first construct a new, comprehensive benchmark with two real-world evaluation datasets of 351 video clips and more than 2,500 object masks under real-world adverse conditions. At the same time, we generate synthetic training data by applying diverse and temporally varying corruptions to existing VOS datasets. Moreover, we present a new RobustPVOS method, dubbed Memory-object-conditioned Gated-rank Adaptation (MoGA). The key to successfully performing RobustPVOS is two-fold: effectively handling object-specific degradation and ensuring temporal consistency in predictions. MoGA leverages object-specific representations maintained in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
