InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong, Chen Jason Zhang, Lingxiao Yang, Yuanfeng Song, Di Jiang

TL;DR
InfantCryNet is a data-driven framework that leverages pre-trained models, advanced pooling techniques, and model compression to accurately analyze infant cries and their causes, even in noisy environments.
Contribution
The paper introduces InfantCryNet, a novel framework combining pre-trained audio models, statistical and attention pooling, and model compression for infant cry analysis.
Findings
Outperforms state-of-the-art by 4.4% in accuracy
Reduces model size by 7% with no performance loss
Achieves up to 28% size reduction with minimal accuracy decrease
Abstract
Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfant Health and Development · Child Development and Digital Technology
MethodsAttention Is All You Need · Attention Pooling · Linear Layer · Softmax · Knowledge Distillation · Multi-Head Attention
