Harnessing Input-Adaptive Inference for Efficient VLN

Dongwoo Kang; Akhil Perincherry; Zachary Coalson; Aiden Gabriel; Stefan Lee; Sanghyun Hong

arXiv:2508.09262·cs.CV·August 14, 2025

Harnessing Input-Adaptive Inference for Efficient VLN

Dongwoo Kang, Akhil Perincherry, Zachary Coalson, Aiden Gabriel, Stefan Lee, Sanghyun Hong

PDF

TL;DR

This paper introduces input-adaptive algorithms for vision-and-language navigation that significantly reduce computational costs while maintaining performance, making VLN models more practical for resource-limited settings.

Contribution

It proposes three novel adaptive algorithms at different levels—spatial, intra-model, and temporal—for improving VLN efficiency without performance loss.

Findings

01

Over 2× reduction in computation across multiple benchmarks

02

Effective in both standard and continuous environments

03

Maintains comparable navigation performance

Abstract

An emerging paradigm in vision-and-language navigation (VLN) is the use of history-aware multi-modal transformer models. Given a language instruction, these models process observation and navigation history to predict the most appropriate action for an agent. While they have significantly improved performance, the scale of these models can be a bottleneck in practical settings with limited computational resources. In this work, we propose a novel input-adaptive navigation method to enhance VLN model efficiency. We first show that existing input-adaptive mechanisms fail to reduce computations without substantial performance degradation. To address this, we introduce three adaptive algorithms, each deployed at a different level: (1) To improve spatial efficiency, we selectively process panoramic views at each observation of an agent. (2) To improve intra-model efficiency, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.