Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations
Guillem Bonafos, Clara Bourot, Pierre Pudlo, Jean-Marc, Freyermuth, Laurence Reboul, Samuel Tron\c{c}on, Arnaud Rey

TL;DR
This paper introduces a novel clustering method for infant vocalizations using topologically augmented representations and Dirichlet process mixture models, revealing 8 distinct vocalization clusters over the first year of life.
Contribution
It combines topological data analysis with Bayesian non-parametrics to automatically identify vocalization categories without predefining the number of clusters.
Findings
Identified 8 distinct vocalization clusters.
Analyzed temporal distribution of clusters over 12 months.
Compared acoustic profiles of different vocalization groups.
Abstract
Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
