Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning

Chong Ma; Lin Zhao; Yuzhong Chen; Lu Zhang; Zhenxiang Xiao; Haixing; Dai; David Liu; Zihao Wu; Zhengliang Liu; Sheng Wang; Jiaxing Gao; Changhe; Li; Xi Jiang; Tuo Zhang; Qian Wang; Dinggang Shen; Dajiang Zhu; Tianming Liu

arXiv:2205.12466·cs.CV·May 26, 2022

Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning

Chong Ma, Lin Zhao, Yuzhong Chen, Lu Zhang, Zhenxiang Xiao, Haixing, Dai, David Liu, Zihao Wu, Zhengliang Liu, Sheng Wang, Jiaxing Gao, Changhe, Li, Xi Jiang, Tuo Zhang, Qian Wang, Dinggang Shen, Dajiang Zhu, Tianming Liu

PDF

Open Access

TL;DR

This paper introduces EG-ViT, a vision transformer guided by radiologists' eye-gaze data to improve medical image diagnosis, reduce shortcut learning, and enhance interpretability with limited data.

Contribution

The paper proposes a novel eye-gaze-guided vision transformer that incorporates expert visual attention to improve medical diagnosis and model interpretability.

Findings

01

EG-ViT outperforms baseline models on INbreast and SIIM-ACR datasets.

02

The model effectively rectifies shortcut learning and biases.

03

It demonstrates improved interpretability and generalization in medical imaging.

Abstract

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning the meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical imaging, where the clinical data (e.g., MR images with pathology) are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To address this problem, we propose to infuse human experts' intelligence and domain knowledge into the training of deep neural networks. The core idea is that we infuse the visual attention information from expert radiologists to proactively guide the deep model to focus on regions with potential pathology and avoid being trapped in learning harmful shortcuts. To do so, we propose a novel eye-gaze-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Layer Normalization · Dense Connections · Vision Transformer