Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing
Athindran Ramesh Kumar, Balaraman Ravindran, Anand Raghunathan

TL;DR
The paper introduces Pack and Detect (PaD), a method that significantly reduces computational costs in video object detection by packing regions of interest into smaller frames, maintaining accuracy while increasing processing speed.
Contribution
PaD leverages temporal correlation and ROI packing to reduce computation in video object detection, enabling faster processing with minimal accuracy loss.
Findings
Achieves 4x reduction in FLOPS per frame.
Increases throughput by 1.25x on GPU.
Maintains 98.9% detection accuracy.
Abstract
Object detection in videos is an important task in computer vision for various applications such as object tracking, video summarization and video search. Although great progress has been made in improving the accuracy of object detection in recent years due to the rise of deep neural networks, the state-of-the-art algorithms are highly computationally intensive. In order to address this challenge, we make two important observations in the context of videos: (i) Objects often occupy only a small fraction of the area in each video frame, and (ii) There is a high likelihood of strong temporal correlation between consecutive frames. Based on these observations, we propose Pack and Detect (PaD), an approach to reduce the computational requirements of object detection in videos. In PaD, only selected video frames called anchor frames are processed at full size. In the frames that lie between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
