Temporal RoI Align for Video Object Recognition
Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin,, Nenghai Yu, Huamin Feng

TL;DR
This paper introduces Temporal RoI Align, a novel operator that incorporates temporal information from multiple video frames into object detection and segmentation, significantly improving performance.
Contribution
It proposes the Temporal RoI Align operator that leverages feature similarity across frames to enhance video object detection and segmentation.
Findings
Consistently improves detection accuracy across multiple benchmarks.
Enhances video instance segmentation performance.
Can be integrated into existing video detectors with significant gains.
Abstract
Video object detection is challenging in the presence of appearance deterioration in certain video frames. Therefore, it is a natural choice to aggregate temporal information from other frames of the same video into the current frame. However, RoI Align, as one of the most core procedures of video detectors, still remains extracting features from a single-frame feature map for proposals, making the extracted RoI features lack temporal information from videos. In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity. The proposed Temporal RoI Align operator can extract temporal information from the entire video for proposals. We integrate it into single-frame video detectors and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection
MethodsALIGN · Temporal ROIAlign
