Learning word-referent mappings and concepts from raw inputs

Wai Keen Vong; Brenden M. Lake

arXiv:2003.05573·cs.CL·March 13, 2020·6 cites

Learning word-referent mappings and concepts from raw inputs

Wai Keen Vong, Brenden M. Lake

PDF

Open Access

TL;DR

This paper introduces a neural network model that learns word-referent mappings directly from raw images and speech, demonstrating cross-situational learning, generalization to new instances, and referent localization.

Contribution

It presents the first neural model capable of learning from raw multimodal inputs using self-supervision, addressing key challenges in naturalistic language learning.

Findings

01

Successfully learns word-referent mappings from ambiguous scenes

02

Generalizes to novel word instances

03

Locates referents and exhibits mutual exclusivity preference

Abstract

How do children learn correspondences between the language and the world from noisy, ambiguous, naturalistic input? One hypothesis is via cross-situational learning: tracking words and their possible referents across multiple situations allows learners to disambiguate correct word-referent mappings (Yu & Smith, 2007). However, previous models of cross-situational word learning operate on highly simplified representations, side-stepping two important aspects of the actual learning problem. First, how can word-referent mappings be learned from raw inputs such as images? Second, how can these learned mappings generalize to novel instances of a known word? In this paper, we present a neural network model trained from scratch via self-supervision that takes in raw images and words as inputs, and show that it can learn word-referent mappings from fully ambiguous scenes and utterances through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques