Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Sangmin Bae, June-Woo Kim, Won-Yang Cho, Hyerim Baek, Soyoun Son,, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim, Se-Young Yun

TL;DR
This paper introduces a novel Patch-Mix Contrastive Learning approach using Audio Spectrogram Transformer for respiratory sound classification, achieving state-of-the-art results by effectively leveraging large-scale pretraining and data augmentation.
Contribution
It proposes a new Patch-Mix augmentation and contrastive learning method that enhances respiratory sound classification performance with limited medical data.
Findings
Achieves 4.08% improvement over previous best on ICBHI dataset
Demonstrates effective transfer from large-scale visual and audio pretraining
Introduces Patch-Mix augmentation for better representation learning
Abstract
Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonocardiography and Auscultation Techniques · Music and Audio Processing · Respiratory and Cough-Related Research
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam
