SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations
Tanvir Mahmud, Chun-Hao Liu, Burhaneddin Yaman, Diana Marculescu

TL;DR
SSVOD introduces a semi-supervised video object detection framework that leverages motion dynamics and sparse annotations to improve detection accuracy while reducing annotation costs.
Contribution
The paper proposes a novel end-to-end semi-supervised framework that exploits temporal motion cues and introduces flow-warped predictions for robust pseudo-labeling in videos.
Findings
Significant performance improvements on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS datasets.
Effective use of flow-warped predictions for temporal consistency.
Balanced pseudo-labeling approach reduces confirmation bias and noise.
Abstract
Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the availability of annotated frames. (2) Despite having large inter-frame correlations in a video, collecting annotations for a large number of frames per video is expensive, time-consuming, and often redundant. (3) Existing semi-supervised techniques on static images can hardly exploit the temporal motion dynamics inherently present in videos. In this paper, we introduce SSVOD, an end-to-end semi-supervised video object detection framework that exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations. To selectively assemble robust pseudo-labels across groups of frames, we introduce \textit{flow-warped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
SSVOD: Semi-Supervised Video Object Detection With Sparse Annotations· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
