Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

Finn Lukas Busch; Matti Vahs; Quantao Yang; Jes\'us Gerardo Ortega Peimbert; Yixi Cai; Jana Tumova; Olov Andersson

arXiv:2602.18803·cs.RO·February 24, 2026

Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

Finn Lukas Busch, Matti Vahs, Quantao Yang, Jes\'us Gerardo Ortega Peimbert, Yixi Cai, Jana Tumova, Olov Andersson

PDF

Open Access

TL;DR

LoTIS is a robot-agnostic visual navigation model that localizes reference trajectories in image space, enabling zero-shot guidance across diverse robots and environments without requiring calibration or robot-specific training.

Contribution

We introduce LoTIS, a novel approach that localizes reference trajectories in image space for robot-agnostic visual navigation, decoupling perception from action and enabling cross-trajectory training.

Findings

01

Outperforms state-of-the-art by 20-50% in success rate.

02

Achieves 94-98% success in diverse environments.

03

Over 5x improvement on challenging backward traversal tasks.

Abstract

We present LoTIS, a model for visual navigation that provides robot-agnostic image-space guidance by localizing a reference RGB trajectory in the robot's current view, without requiring camera calibration, poses, or robot-specific training. Instead of predicting actions tied to specific robots, we predict the image-space coordinates of the reference trajectory as they would appear in the robot's current view. This creates robot-agnostic visual guidance that easily integrates with local planning. Consequently, our model's predictions provide guidance zero-shot across diverse embodiments. By decoupling perception from action and learning to localize trajectory points rather than imitate behavioral priors, we enable a cross-trajectory training strategy for robustness to viewpoint and camera changes. We outperform state-of-the-art methods by 20-50 percentage points in success rate on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging