Single-base mismatch profiles for NGS samples
Marco Chierici, Giuseppe Jurman, Marco Roncador, Cesare, Furlanello

TL;DR
This paper introduces a method to analyze the relationship between single-base mismatch profiles in NGS samples and the biological nature of the samples, using similarity measures to compare these profiles.
Contribution
It presents a novel approach to characterize NGS samples through Single Base Indicator matrices and introduces similarity measures to relate these profiles to biological sample types.
Findings
Strong correlation between SBI profiles and sample biology under consistent tech conditions
Proposes similarity measures for comparing SBIs effectively
Provides a new blueprint for NGS sample preprocessing analysis
Abstract
Within the preprocessing pipeline of a Next Generation Sequencing sample, its set of Single-Base Mismatches is one of the first outcomes, together with the number of correctly aligned reads. The union of these two sets provides a 4x4 matrix (called Single Base Indicator, SBI in what follows) representing a blueprint of the sample and its preprocessing ingredients such as the sequencer, the alignment software, the pipeline parameters. In this note we show that, under the same technological conditions, there is a strong relation between the SBI and the biological nature of the sample. To reach this goal we need to introduce a similarity measure between SBIs: we also show how two measures commonly used in machine learning can be of help in this context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA modifications and cancer · Genomics and Phylogenetic Studies · Glycosylation and Glycoproteins Research
