Feature Flow: In-network Feature Flow Estimation for Video Object Detection
Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang, Fayao Liu

TL;DR
This paper introduces IFF-Net, a novel network with an in-network feature flow estimation module that directly predicts feature displacement for video object detection, eliminating the need for pre-trained optical flow models and achieving state-of-the-art results.
Contribution
The paper proposes a new in-network feature flow estimation module within IFF-Net that directly predicts feature displacement without pre-training, improving detection accuracy and speed.
Findings
Outperforms existing methods on ImageNet VID
Achieves state-of-the-art detection accuracy
Maintains fast inference speed
Abstract
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent to the pixel displacement, a common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset. With this method,they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an \textbf{I}n-network \textbf{F}eature \textbf{F}low estimation module (IFF module) for video object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
