ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on   Heart Sounds

Jiho Han; Adnan Shaout

arXiv:2502.16914·cs.SD·February 25, 2025

ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds

Jiho Han, Adnan Shaout

PDF

Open Access

TL;DR

This paper introduces ENACT-Heart, an ensemble model combining CNN and ViT for heart sound classification, achieving high accuracy and demonstrating the effectiveness of ensemble methods in cardiovascular diagnostics.

Contribution

The paper presents a novel ensemble approach using CNN and ViT with MoE, significantly improving heart sound classification accuracy over individual models.

Findings

01

Achieved 97.52% classification accuracy.

02

Ensemble outperforms individual CNN and ViT models.

03

Demonstrates potential for improved cardiovascular diagnostics.

Abstract

This study explores the application of Vision Transformer (ViT) principles in audio analysis, specifically focusing on heart sounds. This paper introduces ENACT-Heart - a novel ensemble approach that leverages the complementary strengths of Convolutional Neural Networks (CNN) and ViT through a Mixture of Experts (MoE) framework, achieving a remarkable classification accuracy of 97.52%. This outperforms the individual contributions of ViT (93.88%) and CNN (95.45%), demonstrating the potential for enhanced diagnostic accuracy in cardiovascular health monitoring. These results demonstrate the potential of ensemble methods in enhancing classification performance for cardiovascular health monitoring and diagnosis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonocardiography and Auscultation Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer