COPE: End-to-end trainable Constant Runtime Object Pose Estimation
Stefan Thalhammer, Timothy Patten, Markus Vincze

TL;DR
This paper introduces COPE, an end-to-end trainable method for real-time multi-object 6D pose estimation that is faster and more scalable than traditional multi-stage approaches, achieving superior accuracy.
Contribution
COPE is the first end-to-end trainable framework that directly regresses multiple object poses simultaneously, eliminating the need for separate detection and correspondence stages.
Findings
Achieves >24 fps on images with over 90 objects.
Outperforms state-of-the-art methods in accuracy.
Runs approximately 35 times faster than traditional approaches.
Abstract
State-of-the-art object pose estimation handles multiple instances in a test image by using multi-model formulations: detection as a first stage and then separately trained networks per object for 2D-3D geometric correspondence prediction as a second stage. Poses are subsequently estimated using the Perspective-n-Points algorithm at runtime. Unfortunately, multi-model formulations are slow and do not scale well with the number of object instances involved. Recent approaches show that direct 6D object pose estimation is feasible when derived from the aforementioned geometric correspondences. We present an approach that learns an intermediate geometric representation of multiple objects to directly regress 6D poses of all instances in a test image. The inherent end-to-end trainability overcomes the requirement of separately processing individual object instances. By calculating the mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
COPE: End-to-end Trainable Constant Runtime Object Pose Estimation· youtube
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsTest
