Ambiguous Images With Human Judgments for Robust Visual Event Classification
Kate Sanders, Reno Kriz, Anqi Liu, Benjamin Van Durme

TL;DR
This paper introduces SQUID-E, a dataset of ambiguous images with human uncertainty annotations, to evaluate and improve the robustness of visual event classification models on noisy, uncertain data.
Contribution
It presents a novel dataset of ambiguous images with human uncertainty labels and demonstrates its use in assessing and enhancing model performance on uncertain visual data.
Findings
Existing models struggle with ambiguous images.
The dataset reveals gaps in model calibration.
Ambiguous data can improve model robustness.
Abstract
Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques
MethodsTest
