TL;DR
This paper investigates the domain shift problem in multi-modal RGB-D scene recognition across different cameras, and proposes an adaptive self-supervised translation method to improve cross-domain generalization.
Contribution
It identifies the domain shift issue in multi-modal scene datasets and introduces a novel self-supervised translation approach to enhance model robustness across camera types.
Findings
The domain shift significantly reduces recognition accuracy across different cameras.
The proposed self-supervised translation improves cross-camera scene recognition performance.
Experimental results validate the effectiveness of the adaptive approach.
Abstract
Scene recognition is one of the basic problems in computer vision research with extensive applications in robotics. When available, depth images provide helpful geometric cues that complement the RGB texture information and help to identify discriminative scene image features. Depth sensing technology developed fast in the last years and a great variety of 3D cameras have been introduced, each with different acquisition properties. However, those properties are often neglected when targeting big data collections, so multi-modal images are gathered disregarding their original nature. In this work, we put under the spotlight the existence of a possibly severe domain shift issue within multi-modality scene recognition datasets. As a consequence, a scene classification model trained on one camera may not generalize on data from a different camera, only providing a low recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
