Video Region Annotation with Sparse Bounding Boxes

Yuzheng Xu; Yang Wu; Nur Sabrina binti Zuraimi; Shohei Nobuhara; Ko; Nishino

arXiv:2008.07049·cs.CV·August 18, 2020

Video Region Annotation with Sparse Bounding Boxes

Yuzheng Xu, Yang Wu, Nur Sabrina binti Zuraimi, Shohei Nobuhara, Ko, Nishino

PDF

Open Access

TL;DR

This paper introduces a novel method using a Volumetric Graph Convolutional Network to automatically generate detailed region boundaries in videos from sparse bounding box annotations, reducing the need for dense labeling.

Contribution

It presents a new approach leveraging VGCN for boundary prediction from sparse annotations, improving accuracy and generalization over existing methods.

Findings

01

Effective boundary generation demonstrated on real and synthetic datasets

02

Outperforms existing solutions in accuracy and robustness

03

Ablation studies confirm the importance of spatio-temporal information

Abstract

Video analysis has been moving towards more detailed interpretation (e.g. segmentation) with encouraging progresses. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labour-intensive, few densely annotated video data with detailed region boundaries exist. This work aims to resolve this dilemma by learning to automatically generate region boundaries for all frames of a video from sparsely annotated bounding boxes of target regions. We achieve this with a Volumetric Graph Convolutional Network (VGCN), which learns to iteratively find keypoints on the region boundaries using the spatio-temporal volume of surrounding appearance and motion. The global optimization of VGCN makes it significantly stronger and generalize better than existing solutions. Experimental results using two latest datasets (one real and one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis

MethodsGraph Convolutional Network