MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu, Jiawei Shi, Zheng Dang, Yuchao Dai

TL;DR
MixRI is a lightweight, fast, and memory-efficient neural network that accurately estimates the pose of novel objects in RGB images by effectively utilizing fewer reference images without the need for finetuning.
Contribution
The paper introduces MixRI, a novel lightweight network that reduces reference image requirements and inference time for object pose estimation without sacrificing accuracy.
Findings
Achieves comparable accuracy with fewer reference images.
Reduces memory and inference time significantly.
Performs well across multiple datasets in the BOP challenge.
Abstract
We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finetuning. We design our network to meet the demands of real-world applications, emphasizing reduced memory requirements and fast inference time. Unlike existing works that utilize many reference images and have large network parameters, we directly match points based on the multi-view information between the query and reference images with a lightweight network. Thanks to our reference image fusion strategy, we significantly decrease the number of reference images, thus decreasing the time needed to process these images and the memory required to store them. Furthermore, with our lightweight network, our method requires less inference time. Though with fewer reference images, experiments on seven core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
