Spot The Ball: A Benchmark for Visual Social Inference
Neha Balamurugan, Sarah Wu, Adam Chun, Gabe Gaw, Cristobal Eyzaguirre, Tobias Gerstenberg

TL;DR
This paper introduces 'Spot The Ball', a new benchmark for evaluating visual social inference in AI models using sports images to assess their ability to locate hidden objects based on social cues.
Contribution
The paper presents a novel benchmark dataset and evaluation framework for visual social inference, highlighting the gap between human and AI performance in understanding social cues.
Findings
Humans outperform models by 2-3 times in accuracy.
Models rely on superficial heuristics like image center and proximity.
Humans utilize social cues such as gaze and pose for inference.
Abstract
Humans excel at visual social inference, the ability to infer hidden elements of a scene from subtle behavioral cues such as other people's gaze, pose, and orientation. This ability drives everyday social reasoning in humans and is critical for developing more human-like AI agents. We introduce Spot The Ball, a challenging benchmark for evaluating visual social inference in vision-language models (VLMs) using sports as a test domain. The task is to localize a removed sports ball from soccer, basketball, and volleyball images. We present a curated evaluation set with human baselines and a scalable pipeline for generating additional test items. We evaluate four state-of-the-art VLMs (Gemini, GPT, LLaMA, Qwen) using three prompting strategies, finding that humans are consistently two to three times more accurate (20-34%) than models ( 17%) across all sports. Our analyses show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Social Robot Interaction and HRI
