Random vector generation of a semantic space
Jean-Fran\c{c}ois Delpech, Sabine Ploux

TL;DR
This paper presents a method for constructing a Euclidean semantic space using random vectors and projections from a French synonym dictionary, enabling efficient clustering and disambiguation of terms.
Contribution
It introduces a novel approach combining random vector techniques with orthogonalization to build semantic spaces applicable across languages.
Findings
Effective clustering of semantically related terms
Feasibility of separating homonyms with orthogonalization
Fast, real-time updates of semantic space
Abstract
We show how random vectors and random projection can be implemented in the usual vector space model to construct a Euclidean semantic space from a French synonym dictionary. We evaluate theoretically the resulting noise and show the experimental distribution of the similarities of terms in a neighborhood according to the choice of parameters. We also show that the Schmidt orthogonalization process is applicable and can be used to separate homonyms with distinct semantic meanings. Neighboring terms are easily arranged into semantically significant clusters which are well suited to the generation of realistic lists of synonyms and to such applications as word selection for automatic text generation. This process, applicable to any language, can easily be extended to collocations, is extremely fast and can be updated in real time, whenever new synonyms are proposed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
