A Theory of Taxonomy

Guido D'Amico; Raul Rabadan; Matthew Kleban

arXiv:1611.03890·physics.soc-ph·November 15, 2016

A Theory of Taxonomy

Guido D'Amico, Raul Rabadan, Matthew Kleban

PDF

Open Access

TL;DR

This paper introduces a universal branching model that explains the distribution of items across categories in large taxonomies, with applications spanning ecology, computer science, and library sciences.

Contribution

It proposes a simple, non-parametric model that reproduces observed abundance distributions in diverse real-world datasets, revealing underlying commonalities.

Findings

01

The model accurately fits data from NYC transit, libraries, and microbiomes.

02

It predicts unrepresented categories in finite samples.

03

A universal pattern in taxonomic abundance distributions is identified.

Abstract

A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets -- including items lost and found on the New York City transit system, library books, and a bacterial microbiome -- and discover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Plant and animal studies · Genomics and Phylogenetic Studies