FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching
Zihao Zheng, Xingyue Zhou, Zhihao Mao, Songyu Sun, Lingyue Zhang, Yulong Ao, Yupu Feng, Qiongqiong Zhang, Yonghua Lin, Xiang Chen

TL;DR
FreqCache introduces a frequency domain-based token caching method to significantly accelerate VLN models, overcoming limitations of visual domain approaches and enabling adaptive, efficient caching.
Contribution
The paper proposes FreqCache, a novel frequency-guided token caching framework that improves speed and adaptability in VLN models by leveraging frequency domain analysis.
Findings
Achieves 1.59x speedup in VLN models
Overcomes viewpoint migration challenges in token caching
Maintains negligible overhead during acceleration
Abstract
Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost by reusing token computation results; however, existing token caching approaches rely on visual domain methods for cacheable token selection, leading to challenges when adapted to VLN models. 1) Visual domain methods become invalid when there is viewpoint migration. 2) Visual domain methods neglect critical edge information without the aid of additional algorithms. 3) Visual domain methods overlook the temporal variation of scenarios and lack adjustability in cache budgets. In this paper, we develop detailed analyses and find that the impacts of these challenges exhibit invariance and analyzability in the frequency domain. Based on these, we propose a frequency-guided token caching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
