Orthologs from maxmer sequence context

Kun Gao; Jonathan Miller

arXiv:1509.04412·q-bio.QM·November 24, 2015

Orthologs from maxmer sequence context

Kun Gao, Jonathan Miller

PDF

Open Access

TL;DR

This paper introduces a fast, efficient method using short-range maximal matches to identify orthologs across whole genomes, matching known orthologs and discovering new ones with high accuracy.

Contribution

It presents a novel approach leveraging non-embedded maximal matches for ortholog detection, reducing computational time and improving ortholog identification accuracy.

Findings

01

Recapitulates most exact matches of traditional alignment methods.

02

Recovers high-confidence orthologs with high sensitivity and specificity.

03

Identifies putatively novel orthologs not found in existing databases.

Abstract

Context-dependent identification of orthologs customarily relies on conserved gene order or whole-genome sequence alignment. It is shown here that short-range context--as short as single maximal matches--also provides an effective means to identify orthologs within whole genomes. On pristine (un-repeatmasked) mammalian whole-genome assemblies we perform a genome "intersection" that in general consumes less than one thirtieth of the computation time required by commonly used methods for whole-genome alignment, and we extract "non-embedded maximal matches," maximal matches that are not embedded into other maximal matches, as potential orthologs. An ortholog identified via non-embedded maximal matches is analogous to a "positional ortholog" or a "primary ortholog" as defined in previous literature; such orthologs constitute homologs derived from the same direct ancestor whose ancestral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms