DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification

Shah Saood (1); Saddam Hussain Khan (2) ((1) Artificial Intelligence Lab; Department of Computer Systems Engineering; University of Engineering; Applied Sciences (UEAS); Swat 19060; Pakistan (2) Interdisciplinary Research Center for Smart Mobility; Logistics (IRC-SML); King Fahd University of Petroleum; Minerals (KFUPM); Dhahran 31261; Saudi Arabia)

arXiv:2603.25935·cs.CV·March 30, 2026

DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification

Shah Saood (1), Saddam Hussain Khan (2) ((1) Artificial Intelligence Lab, Department of Computer Systems Engineering, University of Engineering, Applied Sciences (UEAS), Swat 19060, Pakistan (2) Interdisciplinary Research Center for Smart Mobility, Logistics (IRC-SML)

PDF

TL;DR

This paper introduces DenseSwinV2, a hybrid CNN-Transformer model that effectively classifies cassava leaf diseases by combining local and global feature extraction, achieving high accuracy on a large dataset.

Contribution

The novel DenseSwinV2 framework integrates DenseNet and SwinV2 with attention modules, improving disease classification accuracy over existing models.

Findings

01

Achieved 98.02% classification accuracy on cassava disease dataset.

02

Outperformed established CNN and transformer models in accuracy and robustness.

03

Demonstrated effectiveness in real-world conditions with occlusion and noise.

Abstract

This work presents a new Hybrid Dense SwinV2, a two-branch framework that jointly leverages densely connected convolutional features and hierarchical customized Swin Transformer V2 (SwinV2) representations for cassava disease classification. The proposed framework captures high resolution local features through its DenseNet branch, preserving the fine structural cues and also allowing for effective gradient flow. Concurrently, the customized SwinV2 models global contextual dependencies through the idea of shifted-window self attention, which enables the capture of long range interactions critical in distinguishing between visually similar lesions. Moreover, an attention channel-squeeze module is employed for each CNN Transformer stream independently to emphasize discriminative disease related responses and suppress redundant or background driven activations. Finally, these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.