Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data
Vladimir Trifonov, Laura Pasqualucci, Riccardo Dalla-Favera, Raul, Rabadan

TL;DR
This paper introduces fractal-like distributions over rational numbers that emerge in high-throughput biological and clinical data analysis, revealing self-similar structures useful for understanding sequencing errors, tumor genomics, disease prevalence, and viral diversity.
Contribution
It identifies and characterizes a new class of distributions with fractal properties that appear across various biological and clinical data analyses.
Findings
Distributions are discontinuous on rationals but continuous on irrationals.
Self-similar, fractal-like structure observed in sequencing error rates.
Applications include tumor genomics, disease prevalence, and viral diversity analysis.
Abstract
Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing, expression profiles, proteomics, and electronic health records are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of distributions that commonly appear in the analysis of such data. These distributions present some interesting features: they are discontinuous in the rational numbers, but continuous in the irrational numbers, and possess a certain self-similar (fractal-like) structure. The first set of examples which we present here are drawn from a high-throughput sequencing experiment. Here, the self-similar distributions appear as part of the evaluation of the error rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Evolution and Genetic Dynamics · Complex Systems and Time Series Analysis
