An Exploration of Word Embedding Initialization in Deep-Learning Tasks

Tom Kocmi; Ond\v{r}ej Bojar

arXiv:1711.09160·cs.CL·November 28, 2017·21 cites

An Exploration of Word Embedding Initialization in Deep-Learning Tasks

Tom Kocmi, Ond\v{r}ej Bojar

PDF

Open Access

TL;DR

This paper investigates how different initialization methods for word embeddings affect deep learning NLP tasks, finding pretrained embeddings slightly outperform random ones and high-variance initializations hinder learning.

Contribution

It provides a systematic comparison of random and pretrained embedding initializations, highlighting the impact of variance and confirming the robustness of neural networks to fixed random embeddings.

Findings

01

Pretrained embeddings slightly outperform random initializations.

02

High-variance random initializations hinder learning.

03

Networks can learn effectively with fixed random embeddings.

Abstract

Word embeddings are the interface between the world of discrete units of text processing and the continuous, differentiable world of neural networks. In this work, we examine various random and pretrained initialization methods for embeddings used in deep networks and their effect on the performance on four NLP tasks with both recurrent and convolutional architectures. We confirm that pretrained embeddings are a little better than random initialization, especially considering the speed of learning. On the other hand, we do not see any significant difference between various methods of random initialization, as long as the variance is kept reasonably low. High-variance initialization prevents the network to use the space of embeddings and forces it to use other free parameters to accomplish the task. We support this hypothesis by observing the performance in learning lexical relations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings