VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success

Chuhang Liu; Yayun He; Zuheng Kang; Xiaoyang Qu; Jianzong Wang

arXiv:2604.05323·cs.CV·April 8, 2026

VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success

Chuhang Liu, Yayun He, Zuheng Kang, Xiaoyang Qu, Jianzong Wang

PDF

TL;DR

VLA-InfoEntropy introduces a training-free, entropy-based approach that dynamically guides VLA model inference to focus on informative regions, significantly improving speed and efficiency without sacrificing accuracy.

Contribution

It proposes a novel entropy-based method for VLA inference acceleration that combines visual, attention, and temporal cues without additional training.

Findings

01

Reduces inference parameters and accelerates speed

02

Outperforms existing approaches in efficiency and accuracy

03

Effectively identifies informative visual and semantic regions

Abstract

Vision-Language-Action (VLA) models integrate visual perception, language understanding, and action decision-making for cross-modal semantic alignment, exhibiting broad application potential. However, the joint processing of high-dimensional visual features, complex linguistic inputs, and continuous action sequences incurs significant computational overhead and low inference efficiency, thereby hindering real-time deployment and reliability. To address this issue, we use image entropy to quantify the grayscale distribution characteristics of each visual token and introduce attention entropy to capture the distribution of attention scores over task-related text. Visual entropy identifies texture-rich or structurally informative regions, while attention entropy pinpoints semantically relevant tokens. Combined with timestep information, these metrics enable a dynamic transition strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.