RepAugment: Input-Agnostic Representation-Level Augmentation for   Respiratory Sound Classification

June-Woo Kim; Miika Toikkanen; Sangmin Bae; Minseok Kim; Ho-Young Jung

arXiv:2405.02996·cs.SD·May 7, 2024

RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

June-Woo Kim, Miika Toikkanen, Sangmin Bae, Minseok Kim, Ho-Young Jung

PDF

Open Access

TL;DR

This paper introduces RepAugment, a novel data augmentation method for respiratory sound classification that improves accuracy, especially for minority disease classes, by bridging the gap between speech and lung sound representations.

Contribution

The paper proposes RepAugment, a representation-level augmentation technique compatible with waveform pretrained models, outperforming traditional spectrogram-based methods like SpecAugment.

Findings

01

RepAugment outperforms SpecAugment in accuracy.

02

Significant improvement in minority disease class detection, up to 7.14%.

03

Demonstrates the effectiveness of input-agnostic augmentation for respiratory sounds.

Abstract

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonocardiography and Auscultation Techniques · Music and Audio Processing · Chronic Obstructive Pulmonary Disease (COPD) Research