Progressive Sparse Local Attention for Video object detection

Chaoxu Guo; Bin Fan; Jie Gu; Qian Zhang; Shiming Xiang; Veronique; Prinet; Chunhong Pan

arXiv:1903.09126·cs.CV·August 19, 2019·25 cites

Progressive Sparse Local Attention for Video object detection

Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique, Prinet, Chunhong Pan

PDF

Open Access

TL;DR

This paper introduces PSLA, a novel local attention module that improves video object detection by establishing spatial correspondence without optical flow, leading to better accuracy and efficiency.

Contribution

The paper proposes PSLA, RFU, and DenseFT, novel modules that enhance feature propagation and representation in video object detection without relying on optical flow.

Findings

01

Achieves state-of-the-art accuracy on ImageNet VID

02

Uses smaller model size compared to flow-based methods

03

Maintains acceptable runtime speed

Abstract

Transferring image-based object detectors to the domain of videos remains a challenging problem. Previous efforts mostly exploit optical flow to propagate features across frames, aiming to achieve a good trade-off between accuracy and efficiency. However, introducing an extra model to estimate optical flow can significantly increase the overall model size. The gap between optical flow and high-level features can also hinder it from establishing spatial correspondence accurately. Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features. Based on PSLA, Recursive Feature Updating (RFU) and Dense Feature Transforming (DenseFT) are proposed to model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques