LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation

Nitesh Subedi; Adam Haroon; Samuel Tetteh; Prajwal Koirala; Cody Fleming; Soumik Sarkar

arXiv:2602.07629·cs.RO·February 11, 2026

LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation

Nitesh Subedi, Adam Haroon, Samuel Tetteh, Prajwal Koirala, Cody Fleming, Soumik Sarkar

PDF

Open Access

TL;DR

LCLA introduces a modular framework for vision-language navigation that aligns sensory observations to a latent expert policy, enabling robust zero-shot generalization and efficient inference by decoupling perception and control.

Contribution

The paper presents a novel approach that learns a stable latent alignment for vision-language navigation, improving generalization and modularity over end-to-end methods.

Findings

01

Strong in-distribution navigation performance

02

Robust zero-shot generalization to unseen environments

03

Lightweight inference with modular perception-action interface

Abstract

We propose LCLA (Language-Conditioned Latent Alignment), a framework for vision-language navigation that learns modular perception-action interfaces by aligning sensory observations to a latent representation of an expert policy. The expert is first trained with privileged state information, inducing a latent space sufficient for control, after which its latent interface and action head are frozen. A lightweight adapter is then trained to map raw visual-language observations, via a frozen vision-language model, into the expert's latent space, reducing the problem of visuomotor learning to supervised latent alignment rather than end-to-end policy optimization. This decoupling enforces a stable contract between perception and control, enabling expert behavior to be reused across sensing modalities and environmental variations. We instantiate LCLA and evaluate it on a vision-language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning