Lognormality and oscillations in the coverage of high-throughput transcriptomic data towards gene ends
Nicolas Innocenti, Erik Aurell

TL;DR
This paper investigates oscillations in high-throughput transcriptomic data at gene ends, modeling them with Kolmogorov's broken stick model, and explores their potential to improve gene end predictions, revealing subtle non-biological effects.
Contribution
It demonstrates that read count oscillations at gene ends can be modeled with the broken stick model and assesses their utility for predicting gene ends.
Findings
Oscillations are well described by the broken stick model.
Model-based predictions marginally improve gene end predictions.
Subtle non-biological effects influence high-throughput transcriptomic data.
Abstract
High-throughput transcriptomics experiments have reached the stage where the count of the number of reads alignable to a given position can be treated as an almost-continuous signal. This allows to ask questions of biophysical/biotechnical nature, but which may still have biological implications. Here we show that when sequencing RNA fragments from one end, as it is the case on most platforms, an oscillation in the read count is observed at the other end. We further show that these oscillations can be well described by Kolmogorov's 1941 broken stick model. We investigate how the model can be used to improve predictions of gene ends (3' transcript ends) but conclude that with present data the improvement is only marginal. The results highlight subtle effects in high-throughput transcriptomics experiments which do not have a biological origin, but which may still be used to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
