RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph

Yifan Liu; Fangneng Zhan; Wanhua Li; Haowen Sun; Katerina Fragkiadaki; Hanspeter Pfister

arXiv:2511.07717·cs.RO·April 16, 2026

RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph

Yifan Liu, Fangneng Zhan, Wanhua Li, Haowen Sun, Katerina Fragkiadaki, Hanspeter Pfister

PDF

TL;DR

RoboTAG introduces a topological graph-based approach combining 2D and 3D information to estimate robot pose from monocular images, reducing reliance on labeled data and leveraging 3D priors.

Contribution

The paper presents RoboTAG, a novel end-to-end framework that integrates 3D priors with 2D visual data through a topological graph, enabling more accurate robot pose estimation with less labeled data.

Findings

01

Effective across different robot types.

02

Reduces dependence on labeled training data.

03

Utilizes topological consistency for improved accuracy.

Abstract

Estimating robot pose from a monocular RGB image is a challenge in robotics and computer vision. Existing methods typically build networks on top of 2D visual backbones and depend heavily on labeled data for training, which is often scarce in real-world scenarios, causing a sim-to-real gap. Moreover, these approaches reduce the 3D-based problem to 2D domain, neglecting the 3D priors. To address these, we propose Robot Topological Alignment Graph (RoboTAG), which incorporates a 3D branch to inject 3D priors while enabling co-evolution of the 2D and 3D representations, alleviating the reliance on labels. Specifically, the RoboTAG consists of a 3D branch and a 2D branch, where nodes represent the states of the camera and robot system, and edges capture the dependencies between these variables or denote alignments between them. Closed loops are then defined in the graph, on which a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.