On an Application of Relative Entropy

Dmitry V. Khmelev; William J. Teahan

arXiv:cond-mat/0205521·cond-mat.stat-mech·May 23, 2007·3 cites

On an Application of Relative Entropy

Dmitry V. Khmelev, William J. Teahan

PDF

Open Access

TL;DR

This paper presents a method for classifying character sequences like texts and DNA using relative entropy estimated through compression and Markov Chains, demonstrating its effectiveness and comparing it to previous approaches.

Contribution

The paper introduces a simple, computationally efficient approach using first-order Markov Chains for estimating relative entropy in sequence classification tasks.

Findings

01

Markov Chain-based method is precise for sequence classification

02

The approach surpasses previous entropy estimation methods

03

The method is computationally effective

Abstract

We describe general approach to classification of character sequences (texts, DNA) using relative entropy estimated by off-the-shelf compression and Markov Chains and find them precise enough. We also notice that the method for estimating relative entropy described in the paper cond-mat/0108530 "Language Trees..." by D. Benedetto et al. was considered earlier and was found to be easily surpassed by the simple and computationally effective first order Markov Chain approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Fractal and DNA sequence analysis · Authorship Attribution and Profiling