A Novel Comprehensive Approach for Estimating Concept Semantic Similarity in WordNet
Xiao-gang Zhang, Shou-qian Sun, Ke-jun Zhang

TL;DR
This paper introduces a new hybrid information content (IC) computation method and a comprehensive semantic similarity measure for WordNet concepts, significantly improving accuracy over existing measures.
Contribution
It proposes a novel hybrid IC computing method and a comprehensive semantic similarity measure based on topological parameters, enhancing similarity estimation accuracy.
Findings
The new measure outperforms previous measures in correlation with artificial data.
Experimental results on WordNet demonstrate improved similarity accuracy.
The approach effectively utilizes topological parameters for better semantic similarity estimation.
Abstract
Computation of semantic similarity between concepts is an important foundation for many research works. This paper focuses on IC computing methods and IC measures, which estimate the semantic similarities between concepts by exploiting the topological parameters of the taxonomy. Based on analyzing representative IC computing methods and typical semantic similarity measures, we propose a new hybrid IC computing method. Through adopting the parameter dhyp and lch, we utilize the new IC computing method and propose a novel comprehensive measure of semantic similarity between concepts. An experiment based on WordNet "is a" taxonomy has been designed to test representative measures and our measure on benchmark dataset R&G, and the results show that our measure can obviously improve the similarity accuracy. We evaluate the proposed approach by comparing the correlation coefficients between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies
