From phonemes to images: levels of representation in a recurrent neural   model of visually-grounded language learning

Lieke Gelderloos; Grzegorz Chrupa{\l}a

arXiv:1610.03342·cs.CL·October 12, 2016·23 cites

From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning

Lieke Gelderloos, Grzegorz Chrupa{\l}a

PDF

Open Access

TL;DR

This paper introduces a recurrent neural network model that learns to associate phoneme sequences with visual features, demonstrating hierarchical representation of linguistic information from form to meaning in a multimodal learning context.

Contribution

It presents a novel stacked gated recurrent neural network model that learns visually-grounded language from phoneme sequences, revealing hierarchical levels of linguistic representation.

Findings

01

Model successfully predicts visual features from phoneme sequences.

02

Lower network layers are sensitive to phonetic form.

03

Higher layers encode semantic meaning.

Abstract

We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities. We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Language, Metaphor, and Cognition