Real-time Object Detection for Streaming Perception
Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

TL;DR
This paper introduces a real-time streaming perception framework with a novel DualFlow Perception module and trend-aware loss, enabling better future prediction and achieving improved accuracy in autonomous driving scenarios.
Contribution
It proposes a new framework with a DualFlow Perception module and trend-aware loss for enhanced streaming perception in autonomous driving.
Findings
Achieves a 4.9% AP improvement on Argoverse-HD dataset.
Effectively captures moving trends with DualFlow Perception.
Demonstrates the importance of future prediction in real-time perception.
Abstract
Autonomous driving requires the model to perceive the environment and (re)act within a low latency for safety. While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. We build a simple and effective framework for streaming perception. It equips a novel DualFlow Perception module (DFP), which includes dynamic and static flows to capture the moving trend and basic detection feature for streaming prediction. Further, we introduce a Trend-Aware Loss (TAL) combined with a trend factor to generate adaptive weights for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
