Correlation-based Intrinsic Evaluation of Word Vector Representations

Yulia Tsvetkov; Manaal Faruqui; Chris Dyer

arXiv:1606.06710·cs.CL·June 22, 2016

Correlation-based Intrinsic Evaluation of Word Vector Representations

Yulia Tsvetkov, Manaal Faruqui, Chris Dyer

PDF

TL;DR

This paper presents QVEC-CCA, a new intrinsic evaluation metric for word vectors that correlates learned representations with linguistic features, showing better alignment with downstream task performance than existing methods.

Contribution

The paper introduces QVEC-CCA, an innovative correlation-based intrinsic evaluation method for word vectors that improves consistency with extrinsic task performance.

Findings

01

QVEC-CCA scores correlate well with semantic and syntactic tasks.

02

QVEC-CCA outperforms existing intrinsic evaluation methods.

03

The proposed metric provides a more reliable assessment of word vector quality.

Abstract

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.