Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second--Order Vectors
Bridget T. McInnes, Ted Pedersen

TL;DR
This paper enhances vector space models of semantic similarity by integrating human-curated taxonomy information into second-order vectors, improving correlation with human judgments in biomedical contexts.
Contribution
It introduces a novel method combining corpus-based vectors with semantic knowledge from ontologies, advancing semantic similarity measurement techniques.
Findings
Improved correlation with human judgments for similarity and relatedness.
Outperforms several recent word embedding methods on benchmark standards.
Effective integration of semantic knowledge enhances vector space models.
Abstract
Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co--occurrence frequencies or statistical measures of association to weight the importance of particular co--occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second--order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus--based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co--occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
