Deep is a Luxury We Don't Have
Ahmed Taha, Yen Nhi Truong Vu, Brent Mombourquette, Thomas Paul, Matthews, Jason Su, Sadanand Singh

TL;DR
This paper introduces HCT, a high-resolution vision transformer that efficiently models long-range dependencies in medical images using linear self-attention, outperforming CNNs on mammography data.
Contribution
We propose HCT, a novel high-resolution transformer model utilizing linear self-attention to reduce complexity and improve performance on medical imaging tasks.
Findings
HCT outperforms CNN counterparts on mammography datasets.
HCT effectively models long-range dependencies in high-resolution images.
HCT demonstrates a suitable receptive field for medical image analysis.
Abstract
Medical images come in high resolutions. A high resolution is vital for finding malignant tissues at an early stage. Yet, this resolution presents a challenge in terms of modeling long range dependencies. Shallow transformers eliminate this problem, but they suffer from quadratic complexity. In this paper, we tackle this complexity by leveraging a linear self-attention approximation. Through this approximation, we propose an efficient vision model called HCT that stands for High resolution Convolutional Transformer. HCT brings transformers' merits to high resolution images at a significantly lower cost. We evaluate HCT using a high resolution mammography dataset. HCT is significantly superior to its CNN counterpart. Furthermore, we demonstrate HCT's fitness for medical images by evaluating its effective receptive field.Code available at https://bit.ly/3ykBhhf
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Cell Image Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Dropout
