Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
George Kour, Samuel Ackerman, Orna Raz, Eitan Farchi, Boaz Carmeli,, Ateret Anaby-Tavor

TL;DR
This paper introduces automatic, interpretable evaluation measures for semantic similarity metrics at the corpus level, enabling better comparison and understanding of their behavior in NLP applications.
Contribution
It proposes a novel set of evaluation measures for semantic similarity metrics, facilitating meaningful comparison and analysis of their characteristics.
Findings
New metrics better identify semantic distributional mismatch
Classical metrics are more sensitive to surface text perturbations
Evaluation measures effectively capture fundamental metric characteristics
Abstract
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
