Real-time Online Multi-Object Tracking in Compressed Domain
Qiankun Liu, Bin Liu, Yue Wu, Weihai Li, Nenghai Yu

TL;DR
This paper introduces a real-time online multi-object tracking method that operates in the compressed domain, significantly increasing speed while maintaining comparable accuracy by dividing frames into key and non-key categories and using specialized CNNs.
Contribution
The paper proposes a novel framework that tracks objects efficiently in compressed video, combining detection on key frames with motion-based propagation on non-key frames, achieving a 6x speedup.
Findings
About 6 times faster than state-of-the-art methods
Maintains comparable tracking accuracy
Effective in real-time online multi-object tracking
Abstract
Recent online Multi-Object Tracking (MOT) methods have achieved desirable tracking performance. However, the tracking speed of most existing methods is rather slow. Inspired from the fact that the adjacent frames are highly relevant and redundant, we divide the frames into key and non-key frames respectively and track objects in the compressed domain. For the key frames, the RGB images are restored for detection and data association. To make data association more reliable, an appearance Convolutional Neural Network (CNN) which can be jointly trained with the detector is proposed. For the non-key frames, the objects are directly propagated by a tracking CNN based on the motion information provided in the compressed domain. Compared with the state-of-the-art online MOT methods,our tracker is about 6x faster while maintaining a comparable tracking performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Chemical Sensor Technologies · Infrared Target Detection Methodologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
