Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences
Hector Zenil, Peter Minary

TL;DR
This paper presents training-free, information-theoretic measures based on algorithmic complexity that effectively identify high nucleosome occupancy in DNA sequences, outperforming traditional models in certain cases.
Contribution
It introduces novel training-free complexity measures for predicting nucleosome binding sites, demonstrating their effectiveness and complementarity to existing models.
Findings
Complexity indices are informative of nucleosome occupancy.
The measures reveal known in vivo versus in vitro discrepancies.
Complexity-based scores outperform the Kaplan model for high occupancy predictions.
Abstract
We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
