Non-alignment comparison of human and high primate genomes
V.M. Kirzhner, S. Frenkel, A.B. Korol

TL;DR
This study uses compositional spectra analysis to compare human and primate genomes, revealing conserved synteny and phylogenetic signals in both coding and noncoding regions through k-mer based similarity measures.
Contribution
It introduces a novel approach combining compositional spectra and k-mer analysis to detect conserved genomic regions beyond gene anchors, including noncoding and repetitive DNA.
Findings
High correspondence in whole-genome comparisons
Revealed phylogenetic signals in noncoding regions
Combining GC content with k-mer abundances improves similarity detection
Abstract
Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome "dark matter". Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Genetic diversity and population structure
