# Diagnosis of colorectal cancer using residual transformer with mixed attention and explainable AI

**Authors:** Poonam Sharma, Bhisham Sharma, Ajit Noonia, Dhirendra Prasad Yadav, Panos Liatsis, Siamak Pedrammehr, Siamak Pedrammehr, Siamak Pedrammehr, Siamak Pedrammehr

PMC · DOI: 10.1371/journal.pone.0335418 · 2025-11-03

## TL;DR

This paper introduces a new AI model for diagnosing colorectal cancer that combines deep learning and attention mechanisms to improve accuracy and provide explainable results.

## Contribution

The novel RNTNet model integrates ResNeXt and a vision transformer with mixed attention and Grad-CAM for enhanced CRC diagnosis and interpretability.

## Key findings

- RNTNet achieved 97.96% accuracy on the KvasirV1 dataset and 98.20% on the Kather dataset.
- The model's AUC values of 0.9895 and 0.9937 on KvasirV1 and Kather datasets confirm its high diagnostic performance.

## Abstract

Colorectal cancer (CRC) is the leading cause of cancer disease and poses a significant threat to global health. Although deep learning models have been utilized to accurately diagnose CRC, they still face challenges in capturing the global correlations of spatial features, especially in complex textures and morphologically similar features. To overcome these challenges, we propose a hybrid model using a residual network and transformer encoder with mixed attention. The Residual Next Transformer Network (RNTNet) extracts spatial features from CRC images using ResNeXt. ResNeXt utilizes group convolution and skip connections to capture fine-grained features. Furthermore, a vision transformer (ViT) encoder containing a mixed attention block is designed using multiscale feature aggregation to provide global attention to the spatial features. In addition, a Grad-CAM module is added to visualize the model’s decision process to support oncologists with a second opinion. Two publicly available datasets, Kather and KvasirV1, were utilized for model training and testing. The model achieved classification accuracies of 97.96% and 98.20% on the KvasirV1 and Kather datasets, respectively. Model efficacy is also further confirmed by ROC curve analysis, where AUC values of 0.9895 and 0.9937 on the KvasirV1 and Kather datasets are obtained, respectively. Comparative study findings support that RNTNet delivers improvements in accuracy and efficiency compared to state-of-the-art methods.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** CRC (MESH:D015179), cancer disease (MESH:D009369)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12582454/full.md

---
Source: https://tomesphere.com/paper/PMC12582454