Representing Sentences as Low-Rank Subspaces

Jiaqi Mu; Suma Bhat; Pramod Viswanath

arXiv:1704.05358·cs.CL·April 19, 2017·5 cites

Representing Sentences as Low-Rank Subspaces

Jiaqi Mu, Suma Bhat, Pramod Viswanath

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised method for representing sentences as low-rank subspaces of their word vectors, capturing semantic information effectively and outperforming neural models on similarity tasks.

Contribution

It proposes representing sentences as low-rank subspaces based on word vectors, revealing a simple geometric structure that improves semantic similarity performance.

Findings

01

Outperforms neural models by 15% on average in semantic similarity tasks.

02

Sentences' word vectors approximately lie in a low-rank subspace (rank 4).

03

The method is validated across 19 datasets.

Abstract

Sentences are important semantic units of natural language. A generic, distributional representation of sentences that can capture the latent semantics is beneficial to multiple downstream applications. We observe a simple geometry of sentences -- the word representations of a given sentence (on average 10.23 words in all SemEval datasets with a standard deviation 4.84) roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this observation, we represent a sentence by the low-rank subspace spanned by its word vectors. Such an unsupervised representation is empirically validated via semantic textual similarity tasks on 19 different datasets, where it outperforms the sophisticated neural network models, including skip-thought vectors, by 15% on average.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications