Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs
Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren, Huang

TL;DR
This paper introduces APSyn, an unsupervised measure based on intersecting mutually dependent contexts, which outperforms traditional cosine similarity and co-occurrence methods in word similarity tasks.
Contribution
The paper proposes APSyn, a novel unsupervised measure that surpasses cosine and co-occurrence in word similarity evaluation without requiring optimization.
Findings
APSyn outperforms cosine similarity by up to 17.98% on ESL test set.
APSyn improves word similarity measurement accuracy.
The method is effective without parameter optimization.
Abstract
In this paper, we claim that vector cosine, which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
