Normalized Google Distance of Multisets with Applications
Andrew R. Cohen (Dept Electrical, Comput. Engin., Drexel Univ.),, P.M.B. Vitanyi (CWI, Comput. Sci., Univ. Amsterdam)

TL;DR
This paper introduces a new normalized Google distance for multisets of search terms, providing a more effective semantic measure for applications, based on Kolmogorov complexity, and compares it with pairwise NGD results.
Contribution
It proposes a novel NGD for multisets of terms, enhancing semantic analysis capabilities beyond pairwise comparisons, with applications demonstrated and evaluated.
Findings
Multiset NGD improves semantic similarity measurement.
Compared multiset NGD results with pairwise NGD, showing better application performance.
Demonstrated applications in various semantic analysis tasks.
Abstract
Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between pairs of search terms (including phrases) is not sufficient for all applications. We propose an NGD of finite multisets of search terms that is better for many applications. This gives a relative semantics shared by a multiset of search terms. We give applications and compare the results with those obtained using the pairwise NGD. The derivation of NGD method is based on Kolmogorov complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Algorithms and Data Compression · DNA and Biological Computing
