A Critique of a Critique of Word Similarity Datasets: Sanity Check or   Unnecessary Confusion?

Minh Le

arXiv:1707.03819·cs.CL·July 14, 2017

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?

Minh Le

PDF

Open Access

TL;DR

This paper critically examines the sanity check method used to evaluate word similarity datasets, arguing that it is unstable and uninformative, and calls for significant revisions to improve its effectiveness.

Contribution

The paper provides a critical analysis of the sanity check approach, highlighting its limitations and proposing the need for major revisions to better evaluate lexical semantic datasets.

Findings

01

Sanity check is unstable and unreliable.

02

The test offers no meaningful insight into dataset quality.

03

Major revisions are necessary for the sanity check method.

Abstract

Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN -- the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling