PushupBench: Your VLM is not good at counting pushups

Shengzhi Li; Jiarun Chen; Karun Sharma; Jiaqi Su; Shichao Pei

arXiv:2604.23407·cs.CV·April 28, 2026

PushupBench: Your VLM is not good at counting pushups

Shengzhi Li, Jiarun Chen, Karun Sharma, Jiaqi Su, Shichao Pei

PDF

1 Repo 1 Datasets

TL;DR

PushupBench introduces a new dataset to evaluate vision-language models' ability to count repetitions in videos, revealing current models' limitations and the potential for counting to improve broader temporal reasoning.

Contribution

The paper presents PushupBench, a dataset for counting in videos, and demonstrates how fine-tuning on counting tasks enhances general video understanding and temporal reasoning.

Findings

01

Best model achieves 42.1% accuracy, open-source models score around 6%.

02

We show that accuracy alone can be misleading, as weaker models exploit modal counts.

03

Fine-tuning on counting improves performance on other temporal reasoning benchmarks.

Abstract

Large vision-language models (VLMs) can recognize \textit{what} happens in video but fail to count \textit{how many} times. We introduce \textbf{PushupBench}, 446 long-form clips (avg. 36.7s) for evaluating repetition counting. The best frontier model achieves 42.1\% exact accuracy; open-source 4B models score $\sim$ 6\%, matching supervised baselines. We show that accuracy alone misleads -- weaker models exploit the modal count rather than reason temporally. Fine-tuning on counting with 1k samples transfers to general video understanding: MVBench (+2.15), PerceptionTest (+1.88), TVBench (+4.54), suggesting counting is a proxy for broader temporal reasoning.PushupBench incorporated in \texttt{lmms-eval} (https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/1262) and hosted on (pushupbench.com/)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EvolvingLMMs-Lab/lmms-eval/pull/1262
github

Datasets

anonymousatom/pushupbench
dataset· 366 dl
366 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.