TL;DR
This paper introduces spatio-temporal data augmentations for video-agnostic background subtraction, significantly improving generalization and performance of the BSUV-Net 2.0 model on unseen videos and datasets.
Contribution
The authors propose novel spatio-temporal data augmentations and a new evaluation strategy, leading to the development of BSUV-Net 2.0 with superior performance and generalization capabilities.
Findings
BSUV-Net 2.0 outperforms state-of-the-art on unseen videos by ~5% F-score.
Spatio-temporal augmentations improve model generalization across datasets.
A real-time variant, Fast BSUV-Net 2.0, achieves near state-of-the-art performance.
Abstract
Background subtraction (BGS) is a fundamental video processing task which is a key component of many applications. Deep learning-based supervised algorithms achieve very good perforamnce in BGS, however, most of these algorithms are optimized for either a specific video or a group of videos, and their performance decreases dramatically when applied to unseen videos. Recently, several papers addressed this problem and proposed video-agnostic supervised BGS algorithms. However, nearly all of the data augmentations used in these algorithms are limited to the spatial domain and do not account for temporal variations that naturally occur in video data. In this work, we introduce spatio-temporal data augmentations and apply them to one of the leading video-agnostic BGS algorithms, BSUV-Net. We also introduce a new cross-validation training and evaluation strategy for the CDNet-2014 dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
