TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang

TL;DR
TrianguLang is a fast, geometry-aware 3D localization framework that achieves state-of-the-art results without camera calibration, enabling efficient text-guided object localization for robotics and AR.
Contribution
It introduces Geometry-Aware Semantic Attention (GASA) for cross-view feature matching without ground-truth poses, improving accuracy and efficiency over prior methods.
Findings
Achieves state-of-the-art feed-forward segmentation and localization.
Reduces user effort from multiple clicks to a single text query.
Processes frames at 18 FPS without optimization.
Abstract
Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically-plausible but geometrically-inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from clicks to a single text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
