Normalized Google Distance of Multisets with Applications

Andrew R. Cohen (Dept Electrical; Comput. Engin.; Drexel Univ.),; P.M.B. Vitanyi (CWI; Comput. Sci.; Univ. Amsterdam)

arXiv:1308.3177·cs.IR·August 15, 2013·5 cites

Normalized Google Distance of Multisets with Applications

Andrew R. Cohen (Dept Electrical, Comput. Engin., Drexel Univ.),, P.M.B. Vitanyi (CWI, Comput. Sci., Univ. Amsterdam)

PDF

Open Access

TL;DR

This paper introduces a new normalized Google distance for multisets of search terms, providing a more effective semantic measure for applications, based on Kolmogorov complexity, and compares it with pairwise NGD results.

Contribution

It proposes a novel NGD for multisets of terms, enhancing semantic analysis capabilities beyond pairwise comparisons, with applications demonstrated and evaluated.

Findings

01

Multiset NGD improves semantic similarity measurement.

02

Compared multiset NGD results with pairwise NGD, showing better application performance.

03

Demonstrated applications in various semantic analysis tasks.

Abstract

Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between pairs of search terms (including phrases) is not sufficient for all applications. We propose an NGD of finite multisets of search terms that is better for many applications. This gives a relative semantics shared by a multiset of search terms. We give applications and compare the results with those obtained using the pairwise NGD. The derivation of NGD method is based on Kolmogorov complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Algorithms and Data Compression · DNA and Biological Computing