TL;DR
This paper introduces an adaptive multimodal fusion method for object detection in robots, which dynamically weights sensor inputs like appearance, depth, and motion to improve robustness in changing environments.
Contribution
It presents a novel online learning approach for sensor modality fusion using CNN experts, enhancing detection accuracy amid environmental variability.
Findings
Effective adaptation to lighting changes and motion blur.
Improved detection performance in indoor and outdoor scenarios.
Provides a new RGB-D dataset for people detection.
Abstract
Object detection is an essential task for autonomous robots operating in dynamic and changing environments. A robot should be able to detect objects in the presence of sensor noise that can be induced by changing lighting conditions for cameras and false depth readings for range sensors, especially RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion approach for object detection that learns weighting the predictions of different sensor modalities in an online manner. Our approach is based on a mixture of convolutional neural network (CNN) experts and incorporates multiple modalities including appearance, depth and motion. We test our method in extensive robot experiments, in which we detect people in a combined indoor and outdoor scenario from RGB-D data, and we demonstrate that our method can adapt to harsh lighting changes and severe camera motion blur.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
