Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

Atakan I\c{s}{\i}k; Selin Vulga I\c{s}{\i}k; Ahmet Feridun I\c{s}{\i}k; Mah\c{s}uk Taylan

arXiv:2512.22564·eess.AS·December 30, 2025

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

Atakan I\c{s}{\i}k, Selin Vulga I\c{s}{\i}k, Ahmet Feridun I\c{s}{\i}k, Mah\c{s}uk Taylan

PDF

Open Access 1 Models

TL;DR

This paper introduces a geometry-aware optimization framework using SAM for Transformer-based respiratory sound classification, improving generalization, sensitivity, and robustness on noisy, imbalanced datasets.

Contribution

It proposes enhancing Audio Spectrogram Transformers with Sharpness-Aware Minimization and weighted sampling, achieving state-of-the-art results in respiratory sound classification.

Findings

01

Achieved 68.10% accuracy on ICBHI 2017 dataset.

02

Reached 68.31% sensitivity, improving clinical screening reliability.

03

Model learns robust features, not noise, confirmed by t-SNE and attention maps.

Abstract

Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Atakanisik/ICBHI-AST-SAM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonocardiography and Auscultation Techniques · Voice and Speech Disorders · COVID-19 diagnosis using AI