LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification
Niloofar Jazaeri, Hilmi R. Dajani, Marco Janeczek, Martin Bouchard

TL;DR
This paper introduces a compact, multi-branch CNN framework with LMU-based sequence modeling and ensemble fusion to improve cross-domain infant cry classification, emphasizing efficiency and generalization.
Contribution
It presents a novel combination of acoustic features, LMU sequence modeling, and entropy-gated ensemble fusion for robust cross-dataset infant cry classification.
Findings
Improved macro-F1 scores on Baby2020 and Baby Crying datasets.
Efficient LMU backbone reduces recurrent parameters compared to LSTMs.
Real-time, on-device feasible infant cry classification performance.
Abstract
Decoding infant cry causes remains challenging for healthcare monitoring due to short nonstationary signals, limited annotations, and strong domain shifts across infants and datasets. We propose a compact acoustic framework that fuses mel-frequency cepstral coefficients (MFCCs), short-time Fourier transform (STFT) features, and fundamental-frequency (F0) contours within a multi-branch convolutional neural network (CNN) encoder, and models temporal dynamics using an enhanced Legendre Memory Unit (LMU). Compared to LSTMs, the LMU backbone provides stable sequence modeling with substantially fewer recurrent parameters, supporting efficient deployment. To improve cross-dataset generalization, we introduce calibrated posterior ensemble fusion with entropy-gated weighting to preserve domain-specific expertise while mitigating dataset bias. Experiments on Baby2020 and Baby Crying demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
