Telling apart <I>Felidae</I> and <I>Ursidae</I> from the distribution of nucleotides in mitochondrial DNA
Andrij Rovenchak

TL;DR
This study introduces a novel method using nucleotide distribution analysis in mitochondrial DNA to distinguish between Felidae and Ursidae species, successfully classifying species like pandas and koalas.
Contribution
It presents a new numerical approach based on entropy and mean length of nucleotide sequences, employing a probabilistic model and nonadditive Bose-distribution to differentiate species.
Findings
Entropy and mean length discriminate Felidae and Ursidae
Giant pandas identified as bears, koalas not
Felidae sequences have longer tails in distribution
Abstract
Rank--frequency distributions of nucleotide sequences in mitochondrial DNA are defined in a way analogous to the linguistic approach, with the highest-frequent nucleobase serving as a whitespace. For such sequences, entropy and mean length are calculated. These parameters are shown to discriminate the species of the <I>Felidae</I> (cats) and <I>Ursidae</I> (bears) families. From purely numerical values we are able to see in particular that giant pandas are bears while koalas are not. The observed linear relation between the parameters is explained using a simple probabilistic model. The approach based on the nonadditive generalization of the Bose-distribution is used to analyze the frequency spectra of the nucleotide sequences. In this case, the separation of families is not very sharp. Nevertheless, the distributions for <I>Felidae</I> have on average longer tails comparing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
