Word Recognition, Competition, and Activation in a Model of Visually   Grounded Speech

William N. Havard; Jean-Pierre Chevrot; Laurent Besacier

arXiv:1909.08491·cs.CL·September 19, 2019

Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

William N. Havard, Jean-Pierre Chevrot, Laurent Besacier

PDF

TL;DR

This study investigates how a recurrent neural model of visually grounded speech implicitly segments input into word-like units, maps them to visual referents, and reveals insights into word activation and representation mechanisms.

Contribution

The paper introduces a linguistically inspired gating methodology to analyze neural representations, showing that word activation depends on initial phonemes and highlighting the role of specific speech frames.

Findings

01

Model implicitly segments speech into word-like units

02

Word activation depends on first phoneme access

03

Certain speech frames are crucial for word representation

Abstract

In this paper, we study how word-like units are represented and activated in a recurrent neural model of visually grounded speech. The model used in our experiments is trained to project an image and its spoken description in a common representation space. We show that a recurrent model trained on spoken sentences implicitly segments its input into word-like units and reliably maps them to their correct visual referents. We introduce a methodology originating from linguistics to analyse the representation learned by neural networks -- the gating paradigm -- and show that the correct representation of a word is only activated if the network has access to first phoneme of the target word, suggesting that the network does not rely on a global acoustic pattern. Furthermore, we find out that not all speech frames (MFCC vectors in our case) play an equal role in the final encoded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.