A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI
Md Assaduzzaman, Nushrat Jahan Oyshi, Eram Mahamud

TL;DR
This paper introduces a dual-stream vision transformer framework with knowledge distillation and region-aware attention for highly accurate, interpretable gastrointestinal disease classification from endoscopic images, suitable for clinical use.
Contribution
It proposes a novel hybrid teacher-student transformer model with explainability, achieving near-perfect accuracy and efficiency for GI disease diagnosis from medical images.
Findings
Achieved 0.9978 and 0.9928 accuracy on two datasets.
Model's predictions grounded in clinically relevant tissue regions.
Reduced computational complexity with faster inference.
Abstract
The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data volume and subtle variation in inter-class visuals. This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation, where a high-capacity teacher model integrates the global contextual reasoning of a Swin Transformer with the local fine-grained feature extraction of a Vision Transformer. The student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge via soft-label distillation, achieving a balance between efficiency and diagnostic accuracy. Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
