Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
Atakan I\c{s}{\i}k, Selin Vulga I\c{s}{\i}k, Ahmet Feridun I\c{s}{\i}k, Mah\c{s}uk Taylan

TL;DR
This paper introduces a geometry-aware optimization framework using SAM for Transformer-based respiratory sound classification, improving generalization, sensitivity, and robustness on noisy, imbalanced datasets.
Contribution
It proposes enhancing Audio Spectrogram Transformers with Sharpness-Aware Minimization and weighted sampling, achieving state-of-the-art results in respiratory sound classification.
Findings
Achieved 68.10% accuracy on ICBHI 2017 dataset.
Reached 68.31% sensitivity, improving clinical screening reliability.
Model learns robust features, not noise, confirmed by t-SNE and attention maps.
Abstract
Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonocardiography and Auscultation Techniques · Voice and Speech Disorders · COVID-19 diagnosis using AI
