Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers

Ethan Knights

arXiv:2604.20027·cs.CV·April 23, 2026

Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers

Ethan Knights

PDF

TL;DR

This paper shows that fine-tuning Vision Transformers on human saliency maps aligns their attention with human biases without sacrificing classification accuracy, unlike CNNs.

Contribution

It demonstrates that human-like attention biases can be induced in Vision Transformers at no cost to their performance, enhancing interpretability.

Findings

01

Fine-tuning improves alignment with human saliency metrics.

02

Induces human-like biases such as small-object preference and reduced entropy.

03

No performance loss on multiple benchmarks.

Abstract

For state-of-the-art image understanding, Vision Transformers (ViTs) have become the standard architecture but their processing diverges substantially from human attentional characteristics. We investigate whether this cognitive gap can be shrunk by fine-tuning the self-attention weights of Google's ViT-B/16 on human saliency fixation maps. To isolate the effects of semantically relevant signals from generic human supervision, the tuned model is compared against a shuffled control. Fine-tuning significantly improved alignment across five saliency metrics and induced three hallmark human-like biases: tuning reversed the baseline's anti-human large-object bias toward small-objects, amplified the animacy preference and diminished extreme attention entropy. Bayesian parity analysis provides decisive to very-strong evidence that this cognitive alignment comes at no cost to the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.