Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
Ethan Knights

TL;DR
This paper shows that fine-tuning Vision Transformers on human saliency maps aligns their attention with human biases without sacrificing classification accuracy, unlike CNNs.
Contribution
It demonstrates that human-like attention biases can be induced in Vision Transformers at no cost to their performance, enhancing interpretability.
Findings
Fine-tuning improves alignment with human saliency metrics.
Induces human-like biases such as small-object preference and reduced entropy.
No performance loss on multiple benchmarks.
Abstract
For state-of-the-art image understanding, Vision Transformers (ViTs) have become the standard architecture but their processing diverges substantially from human attentional characteristics. We investigate whether this cognitive gap can be shrunk by fine-tuning the self-attention weights of Google's ViT-B/16 on human saliency fixation maps. To isolate the effects of semantically relevant signals from generic human supervision, the tuned model is compared against a shuffled control. Fine-tuning significantly improved alignment across five saliency metrics and induced three hallmark human-like biases: tuning reversed the baseline's anti-human large-object bias toward small-objects, amplified the animacy preference and diminished extreme attention entropy. Bayesian parity analysis provides decisive to very-strong evidence that this cognitive alignment comes at no cost to the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
