A model of infant speech perception and learning
Philip Zurbuchen

TL;DR
This paper presents a computational model of infant speech perception and learning using neural networks and reinforcement learning, incorporating speech synthesis and a novel approach to speaker normalization.
Contribution
It introduces a new model combining Echo State Networks and reinforcement learning to simulate infant speech acquisition with a focus on speaker normalization.
Findings
The model successfully recognizes vowel sounds across different speakers.
Infant imitation is improved by caregiver involvement in the learning process.
A proposed method addresses speaker normalization in infant speech learning.
Abstract
Infant speech perception and learning is modeled using Echo State Network classification and Reinforcement Learning. Ambient speech for the modeled infant learner is created using the speech synthesizer Vocaltractlab. An auditory system is trained to recognize vowel sounds from a series of speakers of different anatomies in Vocaltractlab. Having formed perceptual targets, the infant uses Reinforcement Learning to imitate his ambient speech. A possible way of bridging the problem of speaker normalisation is proposed, using direct imitation but also including a caregiver who listens to the infants sounds and imitates those that sound vowel-like.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neural Networks and Applications · Speech and Audio Processing
