Modality-Buffet for Real-Time Object Detection
Nicolai Dorka, Johannes Meyer, Wolfram Burgard

TL;DR
This paper introduces a reinforcement learning-based method to dynamically select the most suitable object detector from a portfolio for real-time video analysis, optimizing accuracy and computational efficiency.
Contribution
It formulates detector selection as a sequential decision problem and employs RL to improve real-time object detection performance.
Findings
Outperforms individual detectors on the Waymo dataset
Adapts detector choice based on scene complexity
Enhances accuracy without increasing computational load
Abstract
Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different trade-offs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g., in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in at processing time. We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
