A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification

Samiul Based Shuvo; Taufiq Hasan

arXiv:2507.20408·eess.SP·October 21, 2025

A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification

Samiul Based Shuvo, Taufiq Hasan

PDF

TL;DR

This paper introduces a novel multi-stage hybrid CNN-Transformer model for pediatric lung sound classification, addressing the unique acoustic features of children under 6 years old and outperforming previous models.

Contribution

It presents a specialized hybrid CNN-Transformer framework tailored for pediatric lung sounds, incorporating class-wise focal loss to handle data imbalance.

Findings

01

Achieved 0.9039 in binary event classification

02

Outperformed previous models by up to 5.94%

03

Effective in resource-limited settings

Abstract

Automated analysis of lung sound auscultation is essential for monitoring respiratory health, especially in regions facing a shortage of skilled healthcare workers. While respiratory sound classification has been widely studied in adults, its ap plication in pediatric populations, particularly in children aged <6 years, remains an underexplored area. The developmental changes in pediatric lungs considerably alter the acoustic proper ties of respiratory sounds, necessitating specialized classification approaches tailored to this age group. To address this, we propose a multistage hybrid CNN-Transformer framework that combines CNN-extracted features with an attention-based architecture to classify pediatric respiratory diseases using scalogram images from both full recordings and individual breath events. Our model achieved an overall score of 0.9039 in binary event classifi cation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.