
TL;DR
This paper investigates high-dimensional vector semantics, demonstrating how the near-orthogonality of random vectors enables efficient set membership solutions and applications in word embeddings, document similarity, and spam filtering.
Contribution
It introduces a probabilistic method leveraging high-dimensional properties for vector set membership and explores practical applications in NLP tasks.
Findings
High-dimensional vectors are nearly orthogonal, facilitating vector memorization.
A probabilistic approach effectively solves set membership problems.
Applications include word embeddings, document similarity, and spam filtering.
Abstract
In this paper we explore the "vector semantics" problem from the perspective of "almost orthogonal" property of high-dimensional random vectors. We show that this intriguing property can be used to "memorize" random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
