High-Dimensional Vector Semantics

M. Andrecut

arXiv:1802.09914·cs.CL·February 28, 2018

High-Dimensional Vector Semantics

M. Andrecut

PDF

TL;DR

This paper investigates high-dimensional vector semantics, demonstrating how the near-orthogonality of random vectors enables efficient set membership solutions and applications in word embeddings, document similarity, and spam filtering.

Contribution

It introduces a probabilistic method leveraging high-dimensional properties for vector set membership and explores practical applications in NLP tasks.

Findings

01

High-dimensional vectors are nearly orthogonal, facilitating vector memorization.

02

A probabilistic approach effectively solves set membership problems.

03

Applications include word embeddings, document similarity, and spam filtering.

Abstract

In this paper we explore the "vector semantics" problem from the perspective of "almost orthogonal" property of high-dimensional random vectors. We show that this intriguing property can be used to "memorize" random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.