Modality-Buffet for Real-Time Object Detection

Nicolai Dorka; Johannes Meyer; Wolfram Burgard

arXiv:2011.08726·cs.LG·November 18, 2020

Modality-Buffet for Real-Time Object Detection

Nicolai Dorka, Johannes Meyer, Wolfram Burgard

PDF

TL;DR

This paper introduces a reinforcement learning-based method to dynamically select the most suitable object detector from a portfolio for real-time video analysis, optimizing accuracy and computational efficiency.

Contribution

It formulates detector selection as a sequential decision problem and employs RL to improve real-time object detection performance.

Findings

01

Outperforms individual detectors on the Waymo dataset

02

Adapts detector choice based on scene complexity

03

Enhances accuracy without increasing computational load

Abstract

Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different trade-offs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g., in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in at processing time. We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.