Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multiscale Feature Fusion and Attention Enhancement

Junyu Zhou; Yanxiong Li; Haolin Yu

arXiv:2506.18402·eess.AS·June 24, 2025·Interspeech

Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multiscale Feature Fusion and Attention Enhancement

Junyu Zhou, Yanxiong Li, Haolin Yu

PDF

1 Repo

TL;DR

This paper presents an improved ECAPA-TDNN model with multiscale feature fusion and attention mechanisms for infant cry emotion recognition, achieving higher accuracy despite limited data and noise challenges.

Contribution

The study introduces a novel ECAPA-TDNN variant with enhanced feature fusion and attention, specifically designed for infant cry emotion recognition tasks.

Findings

01

Achieved 82.20% accuracy on a public dataset.

02

Model has 1.43 MB parameters and 0.32 G FLOPs.

03

Outperforms baseline methods in accuracy.

Abstract

Infant cry emotion recognition is crucial for parenting and medical applications. It faces many challenges, such as subtle emotional variations, noise interference, and limited data. The existing methods lack the ability to effectively integrate multi-scale features and temporal-frequency relationships. In this study, we propose a method for infant cry emotion recognition using an improved Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) with both multi-scale feature fusion and attention enhancement. Experiments on a public dataset show that the proposed method achieves accuracy of 82.20%, number of parameters of 1.43 MB and FLOPs of 0.32 Giga. Moreover, our method has advantage over the baseline methods in terms of accuracy. The code is at https://github.com/kkpretend/IETMA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kkpretend/ietma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.