Moving Other Way: Exploring Word Mover Distance Extensions
Ilya Smirnov, Ivan P. Yamshchikov

TL;DR
This paper explores extensions to the Word Mover's Distance metric, incorporating word frequency and vector space geometry, and evaluates their effectiveness on document classification tasks.
Contribution
It introduces and empirically tests novel extensions of WMD, improving classification accuracy over the original metric.
Findings
Some extensions outperform WMD in classification error
Incorporating word frequency improves similarity measurement
Geometry-based modifications enhance WMD performance
Abstract
The word mover's distance (WMD) is a popular semantic similarity metric for two texts. This position paper studies several possible extensions of WMD. We experiment with the frequency of words in the corpus as a weighting factor and the geometry of the word vector space. We validate possible extensions of WMD on six document classification datasets. Some proposed extensions show better results in terms of the k-nearest neighbor classification error than WMD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
