See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model

Yixu Feng; Zinan Zhao; Yanxiang Ma; Chenghao Xia; Chengbin Du; Yunke Wang; Chang Xu

arXiv:2605.11817·cs.RO·May 19, 2026

See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model

Yixu Feng, Zinan Zhao, Yanxiang Ma, Chenghao Xia, Chengbin Du, Yunke Wang, Chang Xu

PDF

1 Repo

TL;DR

This paper introduces GridS, a differentiable, geometry-aware token resampling method that drastically reduces computational costs in vision-language-action models without sacrificing performance.

Contribution

It proposes a novel, continuous token resampling module, GridS, that preserves critical spatial information while enabling significant compression in VLA models.

Findings

01

Achieves over 76% reduction in FLOPs with no success rate loss.

02

Preserves essential geometric details with fewer than 10% of original tokens.

03

Demonstrates effectiveness on LIBERO benchmark and real robotic platform.

Abstract

Vision-Language-Action (VLA) models have shown remarkable promise in robotics manipulation, yet their high computational cost hinders real-time deployment. Existing token pruning methods suffer from a fundamental trade-off: aggressive compression using pruning inevitably discards critical geometric details like contact points, leading to severe performance degradation. This forces a compromise, limiting the achievable compression rate and thus the potential speedup. We argue that breaking this trade-off requires rethinking compression as a geometry-aware, continuous token resampling in the vision encoder. To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA. By adaptively predicting a minimal set of salient coordinates and extracting features via differentiable interpolation, GridS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Fediory/Grid-Sampler
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.