Probabilistic modeling of occurring substitutions in PAR-CLIP data
Monica Golumbeanu, Pejman Mohammadi, and Niko Beerenwinkel

TL;DR
BMix is a probabilistic method that improves the detection of true T-to-C substitutions in PAR-CLIP data by accounting for noise, leading to more accurate identification of RNA-protein interaction sites.
Contribution
Introduces BMix, a novel probabilistic approach that explicitly models noise in PAR-CLIP data to distinguish true from false T-to-C substitutions.
Findings
Outperforms existing methods in speed and accuracy
Effective on both simulated and real datasets
Provides a robust statistical framework for PAR-CLIP analysis
Abstract
Photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) is an experimental method based on next-generation sequencing for identifying the RNA interaction sites of a given protein. The method deliberately inserts T-to-C substitutions at the RNA-protein interaction sites, which provides a second layer of evidence compared to other CLIP methods. However, the experiment includes several sources of noise which cause both low-frequency errors and spurious high-frequency alterations. Therefore, rigorous statistical analysis is required in order to separate true T-to-C base changes, following cross-linking, from noise. So far, most of the existing PAR-CLIP data analysis methods focus on discarding the low-frequency errors and rely on high-frequency substitutions to report binding sites, not taking into account the possibility of high-frequency false positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA Research and Splicing · RNA and protein synthesis mechanisms · RNA modifications and cancer
