Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices
Xingjian Yang, Zhitao Yu, and Ashis G. Banerjee

TL;DR
Sparse Color-Code Net (SCCN) enables real-time 6D object pose estimation on edge devices by leveraging sparse geometry features and a novel symmetry representation, achieving high accuracy and speed.
Contribution
The paper introduces SCCN, a novel RGB-based pipeline that improves real-time 6D pose estimation efficiency and handles symmetric objects effectively on edge hardware.
Findings
Achieves 19 FPS on LINEMOD dataset with high accuracy.
Maintains real-time performance at 6 FPS on Occlusion LINEMOD.
Effectively addresses symmetric object ambiguities.
Abstract
As robotics and augmented reality applications increasingly rely on precise and efficient 6D object pose estimation, real-time performance on edge devices is required for more interactive and responsive systems. Our proposed Sparse Color-Code Net (SCCN) embodies a clear and concise pipeline design to effectively address this requirement. SCCN performs pixel-level predictions on the target object in the RGB image, utilizing the sparsity of essential object geometry features to speed up the Perspective-n-Point (PnP) computation process. Additionally, it introduces a novel pixel-level geometry-based object symmetry representation that seamlessly integrates with the initial pose predictions, effectively addressing symmetric object ambiguities. SCCN notably achieves an estimation rate of 19 frames per second (FPS) and 6 FPS on the benchmark LINEMOD dataset and the Occlusion LINEMOD dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Visual Attention and Saliency Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
