Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
Okko R\"as\"anen

TL;DR
This paper reviews recent computational models that learn early language from speech and audiovisual input without relying on linguistic priors, highlighting advances in self-supervised and grounded learning.
Contribution
It introduces how modern models are increasingly capable of learning speech features without linguistic priors and links these models to theories of infant language development.
Findings
Models can learn speech features without linguistic priors
Shared learning principles explain early language development
Simulations are becoming more realistic and empirically grounded
Abstract
Learning to understand speech appears almost effortless for typically developing infants, yet from an information-processing perspective, acquiring a language from acoustic speech is an enormous challenge. This chapter reviews recent developments in using computational models to understand early language acquisition from speech and audiovisual input. The focus is on self-supervised and visually grounded models of perceptual learning. We show how these models are becoming increasingly powerful in learning various aspects of speech without strong linguistic priors, and how many features of early language development can be explained through a shared set of learning principles-principles broadly compatible with multiple theories of language acquisition and human cognition. We also discuss how modern learning simulations are gradually becoming more realistic, both in terms of input data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage Development and Disorders · Phonetics and Phonology Research · Multisensory perception and integration
