Tracking Instances as Queries
Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng,, Wenyu Liu

TL;DR
QueryTrack introduces a unified query-based framework for video instance segmentation that leverages the one-to-one correspondence between instances and queries, achieving competitive results with a simple, end-to-end model.
Contribution
The paper proposes QueryTrack, a novel query-based VIS framework that effectively models instance tracking as queries, winning second place in the CVPR 2021 challenge with a streamlined approach.
Findings
Achieved 52.7 / 52.3 AP on YouTube-VIS-2019 / 2021 datasets.
Secured second place in the YouTube-VIS Challenge at CVPR 2021.
Provided baseline results on YouTube-VIS-2021 for the VIS community.
Abstract
Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation. However, how to establish a query based video instance segmentation (VIS) framework with elegant architecture and strong performance remains to be settled. In this paper, we present \textbf{QueryTrack} (i.e., tracking instances as queries), a unified query based VIS framework fully leveraging the intrinsic one-to-one correspondence between instances and queries in QueryInst. The proposed method obtains 52.7 / 52.3 AP on YouTube-VIS-2019 / 2021 datasets, which wins the 2-nd place in the YouTube-VIS Challenge at CVPR 2021 \textbf{with a single online end-to-end model, single scale testing \& modest amount of training data}. We also provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
