A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

Shuning Jiang; Wei-Lun Chao; Daniel Haehn; Hanspeter Pfister; Jian Chen

arXiv:2507.03866·cs.LG·September 24, 2025

A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen

PDF

Open Access

TL;DR

This paper introduces a data-domain sampling regime to evaluate CNNs' graphic perception, revealing that CNNs can outperform humans in ratio estimation tasks and their biases depend on training-test distribution differences.

Contribution

The authors develop a novel sampling regime for assessing CNNs' perception behaviors and provide extensive analysis comparing CNNs and humans across multiple trials.

Findings

01

CNNs can outperform humans in ratio estimation tasks.

02

CNN biases are influenced by training-test distribution discrepancies.

03

The sampling regime enables detailed analysis of CNN perception behaviors.

Abstract

We present a data-domain sampling regime for quantifying CNNs' graphic perception behaviors. This regime lets us evaluate CNNs' ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 113 human participants, we arrived at a simple and actionable conclusion: CNNs can outperform humans and their biases simply depend on the training-test distance. We show evidence of this simple, elegant behavior of the machines when they interpret visualization images. osf.io/gfqc3 provides registration, the code for our sampling regime, and experimental results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForensic Anthropology and Bioarchaeology Studies · Face Recognition and Perception · Image Processing and 3D Reconstruction