RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
Yifan Liu, Fangneng Zhan, Wanhua Li, Haowen Sun, Katerina Fragkiadaki, Hanspeter Pfister

TL;DR
RoboTAG introduces a topological graph-based approach combining 2D and 3D information to estimate robot pose from monocular images, reducing reliance on labeled data and leveraging 3D priors.
Contribution
The paper presents RoboTAG, a novel end-to-end framework that integrates 3D priors with 2D visual data through a topological graph, enabling more accurate robot pose estimation with less labeled data.
Findings
Effective across different robot types.
Reduces dependence on labeled training data.
Utilizes topological consistency for improved accuracy.
Abstract
Estimating robot pose from a monocular RGB image is a challenge in robotics and computer vision. Existing methods typically build networks on top of 2D visual backbones and depend heavily on labeled data for training, which is often scarce in real-world scenarios, causing a sim-to-real gap. Moreover, these approaches reduce the 3D-based problem to 2D domain, neglecting the 3D priors. To address these, we propose Robot Topological Alignment Graph (RoboTAG), which incorporates a 3D branch to inject 3D priors while enabling co-evolution of the 2D and 3D representations, alleviating the reliance on labels. Specifically, the RoboTAG consists of a 3D branch and a 2D branch, where nodes represent the states of the camera and robot system, and edges capture the dependencies between these variables or denote alignments between them. Closed loops are then defined in the graph, on which a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
