LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models

Soumyaratna Debnath; Bui Duc Manh; Zinan Liu; Lin Wang

arXiv:2603.14882·cs.CV·April 28, 2026

LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models

Soumyaratna Debnath, Bui Duc Manh, Zinan Liu, Lin Wang

PDF

TL;DR

LLMind introduces a bio-inspired, training-free adaptive sampling framework for vision-language models, significantly improving efficiency and performance under limited pixel budgets by mimicking human visual attention mechanisms.

Contribution

It proposes a novel bio-inspired adaptive sampling strategy and test-time semantic feedback, enabling efficient, non-uniform visual representations without retraining existing models.

Findings

01

Achieves +20% on VQAv2 with limited pixels

02

Retains up to 97% of full-resolution performance with only 5% pixels

03

Outperforms uniform sampling baselines across multiple benchmarks

Abstract

Vision-Language Models (VLMs) typically assume a uniform spatial fidelity across the entire field of view of visual inputs, dedicating equal precision to even the uninformative regions. By contrast, human vision is neither uniform nor static; it is adaptive, selective, and resource-efficient. In light of this, we present the first systematic analysis of bio-inspired visual representation methods, providing insights for more efficient and adaptive VLMs. We propose LLMind (Looking Like the Mind), a novel training-free framework that mimics foveated encoding and cortical magnification in human vision to achieve adaptive, efficient representations for VLMs under tight pixel budgets. Our key idea is to explore a Bio-inspired Adaptive Sampling Strategy (BASS), enabling a Mobius-parameterized module that performs non-uniform sampling while preserving global scene structure. On top of BASS, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.