On J. Goodman's comment to "Language Trees and Zipping"
D. Benedetto, E. Caglioti, V. Loreto

TL;DR
This paper defends the effectiveness of data compression methods in linguistic analysis against J. Goodman's critique, demonstrating consistent results across multiple compression schemes and emphasizing their broad applicability.
Contribution
It provides a rebuttal to Goodman's claims by replicating experiments with different compression algorithms, confirming the original approach's validity and highlighting its generality.
Findings
Goodman's results are consistent across various compression schemes.
Data compression techniques remain effective for language analysis.
The approach has wide-ranging potential applications.
Abstract
Motivated by the recent submission to cond-mat archives by J. Goodman (cond-mat/0202383) whose results apparently discredit the approach we have proposed in a recent paper (Phys. Rev. Lett., 88, 048702 (2002), cond-mat/0108530), we report the results of the same experiment performed by Goodman using three different data compression schemes. As a matter of fact the three zippers display the same efficiency Goodman obtained using Naive Bayesian Methods and not, as Goodman claimed, an efficiency three times smaller. We point out the question of the extreme generality of approaches based on data compression techniques and we list a large range of potential applications, including those of interest for the physics community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Algorithms and Data Compression · Neural Networks and Applications
