Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism

Orchid Chetia Phukan; Girish; Mohd Mujtaba Akhtar; Swarup Ranjan Behera; Priyabrata Mallick; Santanu Roy; Arun Balaji Buduru; Rajesh Sharma

arXiv:2506.01148·eess.AS·June 3, 2025

Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma

PDF

Open Access

TL;DR

This paper introduces BAOMI, a novel fusion framework using a bandit-based cross-attention mechanism to combine neural audio codec representations with spectral features, significantly improving heart murmur classification accuracy.

Contribution

The paper presents a new fusion method with a bandit-based cross-attention mechanism that effectively combines NACRs and spectral features for heart murmur classification.

Findings

01

Achieved state-of-the-art performance in heart murmur classification.

02

Demonstrated superior results over individual features and baseline fusion methods.

03

Validated the effectiveness of the bandit-based attention mechanism.

Abstract

In this study, we focus on heart murmur classification (HMC) and hypothesize that combining neural audio codec representations (NACRs) such as EnCodec with spectral features (SFs), such as MFCC, will yield superior performance. We believe such fusion will trigger their complementary behavior as NACRs excel at capturing fine-grained acoustic patterns such as rhythm changes, spectral features focus on frequency-domain properties such as harmonic structure, spectral energy distribution crucial for analyzing the complex of heart sounds. To this end, we propose, BAOMI, a novel framework banking on novel bandit-based cross-attention mechanism for effective fusion. Here, a agent provides more weightage to most important heads in multi-head cross-attention mechanism and helps in mitigating the noise. With BAOMI, we report the topmost performance in comparison to individual NACRs, SFs, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Phonocardiography and Auscultation Techniques · Music and Audio Processing

MethodsFocus