# An explainable vision transformer model with transfer learning for accurate bean leaf disease classification

**Authors:** Saiprasad Potharaju, Arun Singh, Dalwinder Singh, Swapnali N. Tambe, Prasad MVV Kantipudi, B. Kiranmai

PMC · DOI: 10.1038/s41598-026-41723-9 · 2026-02-24

## TL;DR

This paper introduces an explainable AI model for identifying bean leaf diseases, combining Vision Transformers and transfer learning to improve accuracy and trustworthiness in agriculture.

## Contribution

The novel framework integrates Vision Transformers with GradCAM++ for explainable disease classification in beans, enhancing transparency and accuracy.

## Key findings

- The model achieved 97.52% validation accuracy on the I-Bean dataset.
- GradCAM++ visualizations effectively highlight disease regions, improving model trustworthiness.
- The framework outperforms traditional CNNs in capturing global leaf patterns.

## Abstract

Early identification of bean leaf diseases, particularly Angular Leaf Spot and Bean Rust, is vital for ensuring crop productivity and global food security, especially within smallholder farming systems where disease outbreaks can rapidly escalate and cause severe yield losses. Conventional disease identification through visual inspection is labor-intensive, subjective, and highly dependent on expert knowledge, making it impractical for large-scale agricultural monitoring. Although recent deep learning-based approaches have demonstrated impressive accuracy in plant disease classification, their inherent “black-box” nature significantly limits real-world adoption, as farmers and agronomists often lack the ability to understand, trust, or act upon unexplained predictions. To address these challenges, this study proposes an automated and explainable disease diagnostic framework based on a Vision Transformer (ViT-B/16) architecture optimized through transfer learning from ImageNet. Unlike traditional convolutional neural networks that primarily focus on localized features, the Vision Transformer processes images as a sequence of flattened patches and leverages self-attention mechanisms to capture long-range dependencies and global contextual patterns across the entire leaf surface. This global representation enables the model to detect subtle and spatially distributed disease symptoms that are often overlooked by CNN-based approaches. To further enhance transparency and interpretability, GradCAM + + is integrated into the framework as an explainable artificial intelligence (XAI) mechanism. This method generates class-specific heatmaps that visually highlight the exact pathological regions influencing the model’s predictions, thereby establishing a human-interpretable validation loop for farmers, agronomists, and domain experts. The proposed framework was evaluated on the publicly available I-Bean dataset, achieving a validation accuracy of 97.52% along with strong precision, recall, and F1-score performance. The generated GradCAM + + visualizations consistently demonstrate the model’s sensitivity to true diseased regions, reinforcing both the reliability and trustworthiness of its predictions. By combining high-capacity global feature learning with visual explainability, the proposed approach offers a scalable, transparent, and practical solution for real-world precision agriculture. This framework not only enhances diagnostic accuracy but also bridges the critical gap between model performance and user trust, enabling informed decision-making and timely disease management in modern farming environments.

## Full-text entities

- **Diseases:** bean leaf disease (MESH:C536240)

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13031953/full.md

---
Source: https://tomesphere.com/paper/PMC13031953