Object Detection Difficulty: Suppressing Over-aggregation for Faster and   Better Video Object Detection

Bingqing Zhang; Sen Wang; Yifan Liu; Brano Kusy; Xue Li; Jiajun Liu

arXiv:2308.11327·cs.CV·August 23, 2023

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

Bingqing Zhang, Sen Wang, Yifan Liu, Brano Kusy, Xue Li, Jiajun Liu

PDF

1 Repo

TL;DR

This paper introduces an Object Detection Difficulty metric to improve video object detection by selecting better reference frames and reducing unnecessary frame aggregation, resulting in faster and more accurate detection.

Contribution

The paper proposes a novel image-level ODD metric and an ODD Scheduler to enhance VOD accuracy and speed by mitigating over-aggregation and selecting optimal reference frames.

Findings

01

Improves VOD accuracy by selecting better global reference frames.

02

Increases FPS by an average of 73.3% without accuracy loss.

03

Achieves state-of-the-art performance in both speed and accuracy.

Abstract

Current video object detection (VOD) models often encounter issues with over-aggregation due to redundant aggregation strategies, which perform feature aggregation on every frame. This results in suboptimal performance and increased computational complexity. In this work, we propose an image-level Object Detection Difficulty (ODD) metric to quantify the difficulty of detecting objects in a given image. The derived ODD scores can be used in the VOD process to mitigate over-aggregation. Specifically, we train an ODD predictor as an auxiliary head of a still-image object detector to compute the ODD score for each image based on the discrepancies between detection results and ground-truth bounding boxes. The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bingqingzhang/odd-vod
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.