Pack and Detect: Fast Object Detection in Videos Using   Region-of-Interest Packing

Athindran Ramesh Kumar; Balaraman Ravindran; Anand Raghunathan

arXiv:1809.01701·cs.CV·July 18, 2024·1 cites

Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing

Athindran Ramesh Kumar, Balaraman Ravindran, Anand Raghunathan

PDF

Open Access

TL;DR

The paper introduces Pack and Detect (PaD), a method that significantly reduces computational costs in video object detection by packing regions of interest into smaller frames, maintaining accuracy while increasing processing speed.

Contribution

PaD leverages temporal correlation and ROI packing to reduce computation in video object detection, enabling faster processing with minimal accuracy loss.

Findings

01

Achieves 4x reduction in FLOPS per frame.

02

Increases throughput by 1.25x on GPU.

03

Maintains 98.9% detection accuracy.

Abstract

Object detection in videos is an important task in computer vision for various applications such as object tracking, video summarization and video search. Although great progress has been made in improving the accuracy of object detection in recent years due to the rise of deep neural networks, the state-of-the-art algorithms are highly computationally intensive. In order to address this challenge, we make two important observations in the context of videos: (i) Objects often occupy only a small fraction of the area in each video frame, and (ii) There is a high likelihood of strong temporal correlation between consecutive frames. Based on these observations, we propose Pack and Detect (PaD), an approach to reduce the computational requirements of object detection in videos. In PaD, only selected video frames called anchor frames are processed at full size. In the frames that lie between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection