Transformer Architectures for Respiratory Sound Analysis and Multimodal Diagnosis
Theodore Aptekarev, Vladimir Sokolovsky, Gregory Furman

TL;DR
This paper adapts transformer-based models for respiratory sound analysis, achieving high accuracy in asthma detection and demonstrating the potential of multimodal approaches that incorporate clinical metadata for improved diagnosis.
Contribution
It introduces the adaptation of Audio Spectrogram Transformer for respiratory sounds and evaluates a multimodal Vision-Language Model integrating spectrograms with patient data.
Findings
AST achieves ~97% accuracy and 0.98 ROC AUC in asthma detection
VLM reaches 86-87% accuracy, comparable to CNN baseline
Self-attention models outperform traditional CNNs in acoustic screening
Abstract
Respiratory sound analysis is a crucial tool for screening asthma and other pulmonary pathologies, yet traditional auscultation remains subjective and experience-dependent. Our prior research established a CNN baseline using DenseNet201, which demonstrated high sensitivity in classifying respiratory sounds. In this work, we (i) adapt the Audio Spectrogram Transformer (AST) for respiratory sound analysis and (ii) evaluate a multimodal Vision-Language Model (VLM) that integrates spectrograms with structured patient metadata. AST is initialized from publicly available weights and fine-tuned on a medical dataset containing hundreds of recordings per diagnosis. The VLM experiment uses a compact Moondream-type model that processes spectrogram images alongside a structured text prompt (sex, age, recording site) to output a JSON-formatted diagnosis. Results indicate that AST achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonocardiography and Auscultation Techniques · Respiratory and Cough-Related Research · Voice and Speech Disorders
