Prototypical Cross-Attention Networks for Multiple Object Tracking and   Segmentation

Lei Ke; Xia Li; Martin Danelljan; Yu-Wing Tai; Chi-Keung Tang and; Fisher Yu

arXiv:2106.11958·cs.CV·December 2, 2021

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang and, Fisher Yu

PDF

1 Repo 1 Video

TL;DR

Prototypical Cross-Attention Networks (PCAN) leverage spatio-temporal information through prototypes and cross-attention to improve online multiple object tracking and segmentation.

Contribution

The paper introduces PCAN, a novel method that uses space-time prototypes and cross-attention for enhanced object tracking and segmentation.

Findings

01

Outperforms current state-of-the-art on Youtube-VIS and BDD100K datasets.

02

Effective for both one-stage and two-stage segmentation frameworks.

03

Utilizes rich spatio-temporal information for improved accuracy.

Abstract

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SysCV/pcan
pytorchOfficial

Videos

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation· slideslive