TL;DR
PyraPose introduces a feature pyramid network for fast, accurate object pose estimation that generalizes well across domain shifts from synthetic to real data, outperforming existing methods.
Contribution
The paper proposes a novel feature pyramid network architecture for robust, single-shot object pose estimation under domain shift, emphasizing local-global feature representation.
Findings
Outperforms state-of-the-art by up to 35% on standard datasets
Effective in real-world grasping experiments with synthetic training data
Demonstrates strong generalization across different environments
Abstract
Object pose estimation enables robots to understand and interact with their environments. Training with synthetic data is necessary in order to adapt to novel situations. Unfortunately, pose estimation under domain shift, i.e., training on synthetic data and testing in the real world, is challenging. Deep learning-based approaches currently perform best when using encoder-decoder networks but typically do not generalize to new scenarios with different scene characteristics. We argue that patch-based approaches, instead of encoder-decoder networks, are more suited for synthetic-to-real transfer because local to global object information is better represented. To that end, we present a novel approach based on a specialized feature pyramid network to compute multi-scale features for creating pose hypotheses on different feature map resolutions in parallel. Our single-shot pose estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
