Temporal RoI Align for Video Object Recognition

Tao Gong; Kai Chen; Xinjiang Wang; Qi Chu; Feng Zhu; Dahua Lin,; Nenghai Yu; Huamin Feng

arXiv:2109.03495·cs.CV·September 14, 2021

Temporal RoI Align for Video Object Recognition

Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin,, Nenghai Yu, Huamin Feng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Temporal RoI Align, a novel operator that incorporates temporal information from multiple video frames into object detection and segmentation, significantly improving performance.

Contribution

It proposes the Temporal RoI Align operator that leverages feature similarity across frames to enhance video object detection and segmentation.

Findings

01

Consistently improves detection accuracy across multiple benchmarks.

02

Enhances video instance segmentation performance.

03

Can be integrated into existing video detectors with significant gains.

Abstract

Video object detection is challenging in the presence of appearance deterioration in certain video frames. Therefore, it is a natural choice to aggregate temporal information from other frames of the same video into the current frame. However, RoI Align, as one of the most core procedures of video detectors, still remains extracting features from a single-frame feature map for proposals, making the extracted RoI features lack temporal information from videos. In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity. The proposed Temporal RoI Align operator can extract temporal information from the entire video for proposals. We integrate it into single-frame video detectors and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-mmlab/mmtracking
pytorchOfficial

Videos

Temporal ROI Align for Video Object Recognition· underline

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection

MethodsALIGN · Temporal ROIAlign