Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence   and Vector Cosine in VSMs

Enrico Santus; Tin-Shing Chiu; Qin Lu; Alessandro Lenci; Chu-Ren; Huang

arXiv:1603.09054·cs.CL·March 31, 2016·2 cites

Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs

Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren, Huang

PDF

Open Access

TL;DR

This paper introduces APSyn, an unsupervised measure based on intersecting mutually dependent contexts, which outperforms traditional cosine similarity and co-occurrence methods in word similarity tasks.

Contribution

The paper proposes APSyn, a novel unsupervised measure that surpasses cosine and co-occurrence in word similarity evaluation without requiring optimization.

Findings

01

APSyn outperforms cosine similarity by up to 17.98% on ESL test set.

02

APSyn improves word similarity measurement accuracy.

03

The method is effective without parameter optimization.

Abstract

In this paper, we claim that vector cosine, which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis