A recurrent multi-scale approach to RBG-D Object Recognition

Mirco Planamente; Mohammad Reza Loghmani; Barbara Caputo

arXiv:1808.01357·cs.CV·September 6, 2018

A recurrent multi-scale approach to RBG-D Object Recognition

Mirco Planamente, Mohammad Reza Loghmani, Barbara Caputo

PDF

Open Access

TL;DR

This paper introduces RCFusion, a new end-to-end architecture that effectively combines RGB and depth data to improve object recognition accuracy, achieving state-of-the-art results on standard datasets.

Contribution

The paper presents a novel multi-scale fusion architecture for RGB-D object recognition that outperforms existing methods and sets new benchmarks.

Findings

01

Outperforms existing approaches on RGB-D datasets

02

Achieves state-of-the-art results on RGB-D Object Dataset and JHUIT-50

03

Demonstrates the effectiveness of multi-scale feature fusion

Abstract

Technological development aims to produce generations of increasingly efficient robots able to perform complex tasks. This requires considerable efforts, from the scientific community, to find new algorithms that solve computer vision problems, such as object recognition. The diffusion of RGB-D cameras directed the study towards the research of new architectures able to exploit the RGB and Depth information. The project that is developed in this thesis concerns the realization of a new end-to-end architecture for the recognition of RGB-D objects called RCFusion. Our method generates compact and highly discriminative multi-modal features by combining complementary RGB and depth information representing different levels of abstraction. We evaluate our method on standard object recognition datasets, RGB-D Object Dataset and JHUIT-50. The experiments performed show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging