NERVE: Neighbourhood & Entropy-guided Random-walk for training free open-Vocabulary sEgmentation
Kunal Mahatha, Jose Dolz, Christian Desrosiers

TL;DR
NERVE introduces a training-free open-vocabulary segmentation method that leverages self-attention structures and a stochastic random walk to improve object delineation without post-processing, achieving state-of-the-art results.
Contribution
The paper presents a novel training-free approach that integrates global and local information using entropy-guided self-attention and stochastic affinity refinement for open-vocabulary segmentation.
Findings
Achieves state-of-the-art zero-shot segmentation performance on 7 benchmarks.
Effectively delineates objects with arbitrary shapes without post-processing.
Utilizes entropy-based selection of relevant attention maps for improved accuracy.
Abstract
Despite recent advances in Open-Vocabulary Semantic Segmentation (OVSS), existing training-free methods face several limitations: use of computationally expensive affinity refinement strategies, ineffective fusion of transformer attention maps due to equal weighting or reliance on fixed-size Gaussian kernels to reinforce local spatial smoothness, enforcing isotropic neighborhoods. We propose a strong baseline for training-free OVSS termed as NERVE (Neighbourhood \& Entropy-guided Random-walk for open-Vocabulary sEgmentation), which uniquely integrates global and fine-grained local information, exploiting the neighbourhood structure from the self-attention layer of a stable diffusion model. We also introduce a stochastic random walk for refining the affinity rather than relying on fixed-size Gaussian kernels for local context. This spatial diffusion process encourages propagation across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
