MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose   and Size Estimation

Jiaqi Yang; Yucong Chen; Xiangting Meng; Chenxin Yan; Min Li; Ran; Cheng; Lige Liu; Tao Sun; Laurent Kneip

arXiv:2308.08856·cs.CV·March 25, 2024

MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation

Jiaqi Yang, Yucong Chen, Xiangting Meng, Chenxin Yan, Min Li, Ran, Cheng, Lige Liu, Tao Sun, Laurent Kneip

PDF

Open Access

TL;DR

This paper introduces MV-ROPE, a multi-view framework that leverages RGB video streams, SLAM, and pose graph optimization to estimate object pose and size accurately without relying on high-quality depth sensors.

Contribution

The paper presents a novel multi-view approach combining SLAM, a lightweight pose predictor, and pose graph optimization for robust category-level object pose and size estimation from RGB videos.

Findings

01

Achieves comparable accuracy to RGB-D methods with high-quality depth data.

02

Outperforms previous RGB-based methods in scenarios with limited or no depth information.

03

Demonstrates robustness and accuracy across multiple datasets with varying depth quality.

Abstract

Recently there has been a growing interest in category-level object pose and size estimation, and prevailing methods commonly rely on single view RGB-D images. However, one disadvantage of such methods is that they require accurate depth maps which cannot be produced by consumer-grade sensors. Furthermore, many practical real-world situations involve a moving camera that continuously observes its surroundings, and the temporal information of the input video streams is simply overlooked by single-view methods. We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph optimizer. The SLAM module utilizes a video stream and additional scale-sensitive readings to estimate camera poses and metric depth. The object pose predictor then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization