Comment on "Language Trees and Zipping" arXiv:cond-mat/0108530

Xiuli Wang

arXiv:0903.3669·cs.AI·March 24, 2009

Comment on "Language Trees and Zipping" arXiv:cond-mat/0108530

Xiuli Wang

PDF

Open Access

TL;DR

This paper discusses the nature of encoding as a semantic mapping from the universe to strings, emphasizing that the concept of a language tree is about similarity rather than historical lineage.

Contribution

It clarifies the interpretation of language trees as measures of similarity rather than traditional phylogenetic trees, and explores the implications of encoding and semantic information.

Findings

01

Encoding reflects semantic information of objects.

02

Distance between strings can represent model similarity.

03

Language trees are about similarity, not history.

Abstract

Every encoding has priori information if the encoding represents any semantic information of the unverse or object. Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or denotation of the object. If encoding or strings of symbols is the adequate and true mapping of model or object, and the mapping is recursive or computable, the distance between two strings (text) is mapping the distance between models. We then are able to measure the distance by computing the distance between the two strings. Otherwise, we may take a misleading course. "Language tree" may not be a family tree in the sense of historical linguistics. Rather it just means the similarity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques