FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

Zihao Zheng; Xingyue Zhou; Zhihao Mao; Songyu Sun; Lingyue Zhang; Yulong Ao; Yupu Feng; Qiongqiong Zhang; Yonghua Lin; Xiang Chen

arXiv:2604.24391·cs.RO·April 28, 2026

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

Zihao Zheng, Xingyue Zhou, Zhihao Mao, Songyu Sun, Lingyue Zhang, Yulong Ao, Yupu Feng, Qiongqiong Zhang, Yonghua Lin, Xiang Chen

PDF

TL;DR

FreqCache introduces a frequency domain-based token caching method to significantly accelerate VLN models, overcoming limitations of visual domain approaches and enabling adaptive, efficient caching.

Contribution

The paper proposes FreqCache, a novel frequency-guided token caching framework that improves speed and adaptability in VLN models by leveraging frequency domain analysis.

Findings

01

Achieves 1.59x speedup in VLN models

02

Overcomes viewpoint migration challenges in token caching

03

Maintains negligible overhead during acceleration

Abstract

Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost by reusing token computation results; however, existing token caching approaches rely on visual domain methods for cacheable token selection, leading to challenges when adapted to VLN models. 1) Visual domain methods become invalid when there is viewpoint migration. 2) Visual domain methods neglect critical edge information without the aid of additional algorithms. 3) Visual domain methods overlook the temporal variation of scenarios and lack adjustability in cache budgets. In this paper, we develop detailed analyses and find that the impacts of these challenges exhibit invariance and analyzability in the frequency domain. Based on these, we propose a frequency-guided token caching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.