A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
Minh Le

TL;DR
This paper critically examines the sanity check method used to evaluate word similarity datasets, arguing that it is unstable and uninformative, and calls for significant revisions to improve its effectiveness.
Contribution
The paper provides a critical analysis of the sanity check approach, highlighting its limitations and proposing the need for major revisions to better evaluate lexical semantic datasets.
Findings
Sanity check is unstable and unreliable.
The test offers no meaningful insight into dataset quality.
Major revisions are necessary for the sanity check method.
Abstract
Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN -- the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
