Humans can decipher adversarial images
Zhenglong Zhou, Chaz Firestone

TL;DR
Humans can reliably identify the labels assigned by machine-learning models to adversarial images, indicating a closer relationship between human and machine classification than previously thought.
Contribution
This study empirically demonstrates that humans can decipher and predict machine classifications of adversarial images across multiple datasets, challenging assumptions about divergence between human and machine vision.
Findings
Humans reliably identified machine labels on adversarial images.
Human classification patterns closely match machine predictions.
Results held across diverse image sets and recognition challenges.
Abstract
How similar is the human mind to the sophisticated machine-learning systems that mirror its performance? Models of object categorization based on convolutional neural networks (CNNs) have achieved human-level benchmarks in assigning known labels to novel images. These advances promise to support transformative technologies such as autonomous vehicles and machine diagnosis; beyond this, they also serve as candidate models for the visual system itself -- not only in their output but perhaps even in their underlying mechanisms and principles. However, unlike human vision, CNNs can be "fooled" by adversarial examples -- carefully crafted images that appear as nonsense patterns to humans but are recognized as familiar objects by machines, or that appear as one object to humans and a different object to machines. This seemingly extreme divergence between human and machine classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
