Classification of Adventitious Sounds Combining Cochleogram and Vision Transformers
Loredana Daria Mang, Francisco David Gonzalez Martinez, Damian, Martinez Munoz, Sebastian Garcia Galan, Raquel Cortina

TL;DR
This study explores using cochleogram inputs with Vision Transformers for classifying respiratory adventitious sounds, demonstrating superior performance over traditional CNN methods on the ICBHI dataset.
Contribution
First application of cochleogram with Vision Transformer architecture for adventitious sound classification, showing improved accuracy over existing CNN approaches.
Findings
Cochleogram combined with ViT outperforms CNN methods.
ViT demonstrates promising results in respiratory sound classification.
The approach enhances automatic detection of respiratory irregularities.
Abstract
Early identification of respiratory irregularities is critical for improving lung health and reducing global mortality rates. The analysis of respiratory sounds plays a significant role in characterizing the respiratory system's condition and identifying abnormalities. The main contribution of this study is to investigate the performance when the input data, represented by cochleogram, is used to feed the Vision Transformer architecture, since this input classifier combination is the first time it has been applied to adventitious sound classification to our knowledge. Although ViT has shown promising results in audio classification tasks by applying self attention to spectrogram patches, we extend this approach by applying the cochleogram, which captures specific spectro-temporal features of adventitious sounds. The proposed methodology is evaluated on the ICBHI dataset. We compare the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection
