Regularizing Attention Scores with Bootstrapping

Neo Christopher Chung; Maxim Laletin

arXiv:2604.01339·cs.CV·April 3, 2026

Regularizing Attention Scores with Bootstrapping

Neo Christopher Chung, Maxim Laletin

PDF

1 Repo

TL;DR

This paper introduces a bootstrap-based regularization method for attention scores in Vision Transformers, improving interpretability by reducing noise and spurious attention in image analysis.

Contribution

It proposes a novel statistical framework using bootstrapping to quantify uncertainty and regularize attention scores in ViT, enhancing explanation clarity.

Findings

01

Significant reduction of noisy and spurious attention in natural and medical images.

02

Improved sparsity and interpretability of attention maps.

03

Quantitative validation on simulation and real datasets confirms effectiveness.

Abstract

Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ncchung/AttentionRegularization
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.