Neural Mesh Refiner for 6-DoF Pose Estimation
Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, and Rong, Xiao

TL;DR
This paper introduces a neural mesh refiner that enhances 6-DoF pose estimation from monocular images by integrating a differentiable renderer to improve translation accuracy and achieve state-of-the-art results.
Contribution
It proposes a novel method combining mesh rendering with pose regression to improve translation estimation accuracy in monocular 6-DoF pose estimation.
Findings
Significant improvement in translation estimation accuracy.
State-of-the-art performance on benchmark datasets.
Effective integration of geometry with deep learning for pose refinement.
Abstract
How can we effectively utilise the 2D monocular image information for recovering the 6D pose (6-DoF) of the visual objects? Deep learning has shown to be effective for robust and real-time monocular pose estimation. Oftentimes, the network learns to regress the 6-DoF pose using a naive loss function. However, due to a lack of geometrical scene understanding from the directly regressed pose estimation, there are misalignments between the rendered mesh from the 3D object and the 2D instance segmentation result, e.g., bounding boxes and masks prediction. This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh renderer. We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation. By leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Image and Object Detection Techniques
