ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View   General-Purpose 3D Object Detection

Danila Rukhovich; Anna Vorontsova; Anton Konushin

arXiv:2106.01178·cs.CV·October 18, 2021·1 cites

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Danila Rukhovich, Anna Vorontsova, Anton Konushin

PDF

Open Access 2 Repos

TL;DR

ImVoxelNet is a fully convolutional 3D object detection method that effectively utilizes monocular and multi-view RGB images, achieving state-of-the-art results across various indoor and outdoor datasets.

Contribution

It introduces a novel end-to-end approach for multi-view RGB-based 3D detection that handles variable input views and scene types, advancing the field's capabilities.

Findings

01

Achieves state-of-the-art car detection on KITTI and nuScenes datasets.

02

Surpasses existing RGB-based methods on SUN RGB-D.

03

Sets new benchmarks for multi-view 3D detection on ScanNet.

Abstract

In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection. The source code and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection