GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs

Mohamed Eltahir; Lama Ayash; Ali Habibullah; Tanveer Hussain; Naeemullah Khan

arXiv:2605.10762·cs.CV·May 12, 2026

GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs

Mohamed Eltahir, Lama Ayash, Ali Habibullah, Tanveer Hussain, Naeemullah Khan

PDF

1 Repo

TL;DR

GridProbe introduces a test-time adaptive frame selection method for long-video vision-language models, reducing computational cost while maintaining accuracy through posterior-probing and interpretability.

Contribution

It proposes a training-free, posterior-probing inference paradigm that adaptively selects relevant frames based on question difficulty, improving efficiency without retraining.

Findings

01

Matches baseline accuracy with 3.36x less compute on Video-MME-v2.

02

Pareto-dominates baseline on LongVideoBench with 0.35x compute.

03

Decoupling selector and QA models enhances efficiency and accuracy.

Abstract

Long-video understanding in VLMs is bottlenecked by a single monolithic forward pass over thousands of frames at quadratic attention cost. A common mitigation is to first select a small subset of informative frames before the forward pass; common for training-free selectors via auxiliary encoder-space similarities. Such signals are capped by contrastive pretraining, which usually fails on reasoning-heavy queries (negation, cross-frame counting, holistic summarization). We propose GridProbe, an efficient training-free posterior-probing inference paradigm that scores evidence in answer space using a frozen VLM's own reasoning and then selects question-relevant frames adaptively, resulting in sub-quadratic attention cost with little to no accuracy loss. We arrange frames on a $K \times K$ grid and run lightweight row R and column C probes, where each probe reads its peak posterior as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohammad2012191/GridProbe
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.