Few-Shot Video Object Detection
Qi Fan, Chi-Keung Tang, Yu-Wing Tai

TL;DR
This paper presents a new large-scale dataset and novel network architectures for few-shot video object detection, significantly improving detection accuracy in diverse, dynamic real-world scenarios.
Contribution
Introduction of FSVOD-500 dataset, a new Tube Proposal Network (TPN), and an improved Temporal Matching Network (TMN+) for enhanced few-shot video object detection.
Findings
Significantly better detection results than existing methods.
Effective handling of highly dynamic video objects.
End-to-end training of the proposed network components.
Abstract
We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsTemporal Pyramid Network
