The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

Zhen Liu; Yuhan Liu; Jinjun Wang; Jianyi Liu; Wei Song; Jingwen Fu

arXiv:2604.19064·cs.CV·April 22, 2026

The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

Zhen Liu, Yuhan Liu, Jinjun Wang, Jianyi Liu, Wei Song, Jingwen Fu

PDF

TL;DR

This paper introduces a novel mechanism called Stability-Diversity Balance (SDB) that enhances self-improvement in vision-and-language navigation agents by balancing behavioral diversity and learning stability.

Contribution

The paper proposes SDB, a plug-and-play method that generates multiple behavioral hypotheses and stabilizes learning, leading to improved navigation performance.

Findings

01

SDB improves SPL from 33.73 to 35.93 on REVERIE val-unseen.

02

SDB enhances OSR from 51.07 to 54.25 on REVERIE val-unseen.

03

Experiments on R2R, SOON, and REVERIE validate the effectiveness of SDB.

Abstract

In vision-and-language navigation (VLN), self-improvement from policy-induced experience, using only standard VLN action supervision, critically depends on balancing behavioral diversity and learning stability, which governs whether the agent can extract a reliable learning signal for improvement. Increasing behavioral diversity is necessary to expose alternative action hypotheses but can destabilize policy-induced learning signals, whereas overly conservative stability constraints suppress exploration and induce early commitment, making reliable self-improvement difficult. To address this challenge, we propose Stability-Diversity Balance (SDB), a plug-and-play mechanism for balanced self-improvement in VLN. SDB expands each decision step into multiple latent behavioral hypotheses by applying controlled shifts in the instruction-conditioned hidden states, and then performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.