Canonical, Stable, General Mapping using Context Schemes
Adam Novak, Yohei Rosen, David Haussler, Benedict Paten

TL;DR
This paper introduces context schemes, a general and stable method for sequence mapping in genomics that unambiguously recognizes reference bases and supports complex rearrangements, scalable to large query sequences and adaptable to graph-based references.
Contribution
The paper presents context schemes, a novel approach for sequence mapping that ensures unambiguous, stable, and scalable mappings, supporting complex genomic variations and graph-based references.
Findings
High-performance context schemes exist for genomic mapping.
Efficient algorithms for context scheme mapping are developed.
Context schemes support detection of complex rearrangements.
Abstract
Motivation: Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general. Results: We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique best mapping, and define this criterion uniformly for all reference bases. Mappings under context schemes can also be made stable, so that extension of the query string (e.g. by increasing read length) will not alter the mapping of previously mapped positions. Context schemes are general in several senses. They natively support the detection of arbitrary complex, novel rearrangements relative to the reference. They can scale over orders of magnitude in query sequence length. Finally, they are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
