Similarity of Objects and the Meaning of Words

Rudi Cilibrasi (CWI); Paul Vitanyi (CWI; University of; Amsterdam)

arXiv:cs/0602065·cs.CV·May 23, 2007·31 cites

Similarity of Objects and the Meaning of Words

Rudi Cilibrasi (CWI), Paul Vitanyi (CWI, University of, Amsterdam)

PDF

Open Access

TL;DR

This paper introduces universal, compression-based and web-based similarity measures for objects and their names, demonstrating their effectiveness in data mining and semantic analysis through large-scale experiments.

Contribution

It presents novel universal similarity distances for both literal objects and object names, unifying various measures using compression and web data.

Findings

01

Universal distance based on compression effectively measures similarity between literal objects.

02

Web-based similarity using Google page counts correlates with semantic relations.

03

Large-scale experiments support the viability of both approaches.

Abstract

We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicslinguistics and terminology studies