Comment on "Language Trees and Zipping" arXiv:cond-mat/0108530
Xiuli Wang

TL;DR
This paper discusses the nature of encoding as a semantic mapping from the universe to strings, emphasizing that the concept of a language tree is about similarity rather than historical lineage.
Contribution
It clarifies the interpretation of language trees as measures of similarity rather than traditional phylogenetic trees, and explores the implications of encoding and semantic information.
Findings
Encoding reflects semantic information of objects.
Distance between strings can represent model similarity.
Language trees are about similarity, not history.
Abstract
Every encoding has priori information if the encoding represents any semantic information of the unverse or object. Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or denotation of the object. If encoding or strings of symbols is the adequate and true mapping of model or object, and the mapping is recursive or computable, the distance between two strings (text) is mapping the distance between models. We then are able to measure the distance by computing the distance between the two strings. Otherwise, we may take a misleading course. "Language tree" may not be a family tree in the sense of historical linguistics. Rather it just means the similarity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
