Simultaneous Multiple Object Detection and Pose Estimation using 3D   Model Infusion with Monocular Vision

Congliang Li; Shijie Sun; Xiangyu Song; Huansheng Song; Naveed Akhtar; and Ajmal Saeed Mian

arXiv:2211.11188·cs.CV·December 24, 2024·1 cites

Simultaneous Multiple Object Detection and Pose Estimation using 3D Model Infusion with Monocular Vision

Congliang Li, Shijie Sun, Xiangyu Song, Huansheng Song, Naveed Akhtar, and Ajmal Saeed Mian

PDF

Open Access 2 Repos

TL;DR

This paper introduces SMOPE-Net, an end-to-end neural network that simultaneously detects multiple objects and estimates their 3D poses from monocular images, improving efficiency and accuracy in robotics and autonomous driving applications.

Contribution

The paper presents a novel multitasking network for joint object detection and pose estimation using monocular vision and 3D model infusion, along with a new labeling method and dataset.

Findings

01

SMOPE-Net outperforms existing methods on KITTI-6DoF and LineMod datasets.

02

The Twin-Space labeling method enables effective training data annotation.

03

End-to-end training improves detection and pose estimation accuracy.

Abstract

Multiple object detection and pose estimation are vital computer vision tasks. The latter relates to the former as a downstream problem in applications such as robotics and autonomous driving. However, due to the high complexity of both tasks, existing methods generally treat them independently, which is sub-optimal. We propose simultaneous neural modeling of both using monocular vision and 3D model infusion. Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network with a composite loss that also provides the advantages of anchor-free detections for efficient downstream pose estimation. To enable the annotation of training data for our learning objective, we develop a Twin-Space object labeling method and demonstrate its correctness analytically and empirically. Using the labeling method, we provide the KITTI-6DoF…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Image and Object Detection Techniques