TL;DR
This study compares unsupervised hypernymy prediction methods and finds that most are heavily influenced by word frequency, with some methods providing complementary insights despite lower overall accuracy.
Contribution
It reveals the extent of frequency bias in existing hypernymy prediction methods and highlights the importance of checking for frequency effects in such models.
Findings
Most methods' predictions are highly correlated with frequency-based predictions.
SLQS makes correct predictions where other methods fail.
Frequency bias is a significant factor in hypernymy prediction methods.
Abstract
This paper presents a comparison of unsupervised methods of hypernymy prediction (i.e., to predict which word in a pair of words such as fish-cod is the hypernym and which the hyponym). Most importantly, we demonstrate across datasets for English and for German that the predictions of three methods (WeedsPrec, invCL, SLQS Row) strongly overlap and are highly correlated with frequency-based predictions. In contrast, the second-order method SLQS shows an overall lower accuracy but makes correct predictions where the others go wrong. Our study once more confirms the general need to check the frequency bias of a computational method in order to identify frequency-(un)related effects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
