Video Annotation for Visual Tracking via Selection and Refinement

Kenan Dai; Jie Zhao; Lijun Wang; Dong Wang; Jianhua Li; Huchuan Lu,; Xuesheng Qian; Xiaoyun Yang

arXiv:2108.03821·cs.CV·August 10, 2021

Video Annotation for Visual Tracking via Selection and Refinement

Kenan Dai, Jie Zhao, Lijun Wang, Dong Wang, Jianhua Li, Huchuan Lu,, Xuesheng Qian, Xiaoyun Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework that automatically improves video annotations for training visual trackers by selecting reliable results and refining them using deep networks, significantly reducing manual labeling effort.

Contribution

It proposes a selection-and-refinement strategy with a temporal assessment network and a visual-geometry refinement network to enhance automatic video annotation quality.

Findings

01

Achieves highly accurate bounding box annotations

02

Reduces human labeling effort by 94%

03

Boosts tracking performance with augmented data

Abstract

Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorithms. A temporal assessment network (T-Assess Net) is proposed which is able to capture the temporal coherence of target locations and select reliable tracking results by measuring their quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is also designed to further enhance the selected tracking results by considering both target appearance and temporal geometry constraints, allowing inaccurate tracking results to be corrected. The combination of the above two networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daikenan/vasr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Image Enhancement Techniques