Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models

Liwei Che; Zhiyu Xue; Yihao Quan; Benlin Liu; Zeru Shi; Michelle Hurst; Jacob Feldman; Ruixiang Tang; Ranjay Krishna; Vladimir Pavlovic

arXiv:2603.18523·cs.CV·March 20, 2026

Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models

Liwei Che, Zhiyu Xue, Yihao Quan, Benlin Liu, Zeru Shi, Michelle Hurst, Jacob Feldman, Ruixiang Tang, Ranjay Krishna, Vladimir Pavlovic

PDF

Open Access

TL;DR

This paper investigates how large vision-language models perform counting, revealing a shared counting circuit, introducing interpretability methods, and demonstrating that targeted fine-tuning improves counting accuracy and general visual reasoning.

Contribution

The study uncovers a structured counting circuit in LVLMs, introduces novel interpretability techniques, and shows that synthetic image-based fine-tuning enhances counting and reasoning performance.

Findings

01

LVLMs exhibit human-like counting behavior.

02

Introduction of Visual Activation Patching and HeadLens methods.

03

Fine-tuning with synthetic images improves counting and reasoning accuracy.

Abstract

Counting serves as a simple but powerful test of a Large Vision-Language Model's (LVLM's) reasoning; it forces the model to identify each individual object and then add them all up. In this study, we investigate how LVLMs implement counting using controlled synthetic and real-world benchmarks, combined with mechanistic analyses. Our results show that LVLMs display a human-like counting behavior, with precise performance on small numerosities and noisy estimation for larger quantities. We introduce two novel interpretability methods, Visual Activation Patching and HeadLens, and use them to uncover a structured "counting circuit" that is largely shared across a variety of visual reasoning tasks. Building on these insights, we propose a lightweight intervention strategy that exploits simple and abundantly available synthetic images to fine-tune arbitrary pretrained LVLMs exclusively on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications