Low-dimensional Semantic Space: from Text to Word Embedding

Xiaolei Lu; Bin Ni

arXiv:1911.00845·cs.CL·November 5, 2019

Low-dimensional Semantic Space: from Text to Word Embedding

Xiaolei Lu, Bin Ni

PDF

Open Access

TL;DR

This paper explores the development of low-dimensional word embeddings in NLP, discussing linguistic theories, various representation methods, statistical and neural models, and their applications in linguistics.

Contribution

It introduces a comprehensive overview of text representation techniques and models for word embedding, integrating linguistic theories with statistical and neural approaches.

Findings

01

Comparison of one-hot and distributed representations

02

Analysis of statistical and neural language models

03

Applications in word-sense disambiguation and diachronic linguistics

Abstract

This article focuses on the study of Word Embedding, a feature-learning technique in Natural Language Processing that maps words or phrases to low-dimensional vectors. Beginning with the linguistic theories concerning contextual similarities - "Distributional Hypothesis" and "Context of Situation", this article introduces two ways of numerical representation of text: One-hot and Distributed Representation. In addition, this article presents statistical-based Language Models(such as Co-occurrence Matrix and Singular Value Decomposition) as well as Neural Network Language Models (NNLM, such as Continuous Bag-of-Words and Skip-Gram). This article also analyzes how Word Embedding can be applied to the study of word-sense disambiguation and diachronic linguistics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques