Multi-Modality Task Cascade for 3D Object Detection

Jinhyung Park; Xinshuo Weng; Yunze Man; Kris Kitani

arXiv:2107.04013·cs.CV·July 9, 2021·5 cites

Multi-Modality Task Cascade for 3D Object Detection

Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-modality cascade network that integrates 2D and 3D data for improved 3D object detection, significantly outperforming previous methods on the SUN RGB-D dataset.

Contribution

The paper proposes a novel multi-modality task cascade network (MTC-RCNN) that effectively fuses 2D and 3D information for enhanced object detection performance.

Findings

01

Achieved +3.8 [email protected] improvement over state-of-the-art.

02

Demonstrated that integrated 2D-3D training improves both tasks.

03

Introduced dual-head 2D segmentation scheme for robustness.

Abstract

Point clouds and RGB images are naturally complementary modalities for 3D visual understanding - the former provides sparse but accurate locations of points on objects, while the latter contains dense color and texture information. Despite this potential for close sensor fusion, many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. This separated training scheme results in potentially sub-optimal performance and prevents 3D tasks from being used to benefit 2D tasks that are often useful on their own. To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes. We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Divadi/MTC_RCNN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Robotics and Sensor-Based Localization