Universal Similarity

Paul Vitanyi (CWI; University of Amsterdam; and National ICT; Australia)

arXiv:cs/0504089·cs.IR·May 23, 2007

Universal Similarity

Paul Vitanyi (CWI, University of Amsterdam, and National ICT, Australia)

PDF

TL;DR

This paper introduces universal, parameter-free similarity measures for objects and names, based on compression and web data, demonstrating their effectiveness in data analysis and semantic tasks.

Contribution

It proposes universal similarity distances that encompass all family-specific measures, using compression for literal objects and web data for semantic names.

Findings

01

Universal distances outperform individual measures in experiments.

02

Compression-based similarity effectively compares literal objects.

03

Web-based similarity captures semantic relationships accurately.

Abstract

We survey a new area of parameter-free similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like ``red'' or ``christianity.'' For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.