SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks
Iuliia Kotseruba, John K. Tsotsos

TL;DR
This paper introduces SNAP, a new benchmark dataset that evaluates how capture conditions like lighting and camera settings affect deep learning models in vision tasks such as classification, detection, and VQA, revealing significant biases and sensitivities.
Contribution
The paper presents SNAP, a comprehensive benchmark dataset for analyzing capture condition effects on vision models, and provides insights into dataset biases and model vulnerabilities.
Findings
Models are biased by capture conditions.
Models do not reach human accuracy on well-exposed images.
Capture variations significantly affect model performance.
Abstract
Generalization of deep-learning-based (DL) computer vision algorithms to various image perturbations is hard to establish and remains an active area of research. The majority of past analyses focused on the images already captured, whereas effects of the image formation pipeline and environment are less studied. In this paper, we address this issue by analyzing the impact of capture conditions, such as camera parameters and lighting, on DL model performance on 3 vision tasks -- image classification, object detection, and visual question answering (VQA). To this end, we assess capture bias in common vision datasets and create a new benchmark, SNAP (for hutter speed, ISO sesitivity, and erture), consisting of images of objects taken under controlled lighting conditions and with densely sampled camera settings. We then evaluate a large number of DL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies
