De novo reconstruction of satellite repeat units from sequence data
Yujie Zhang, Justin Chu, Haoyu Cheng, Heng Li

TL;DR
This paper introduces Satellite Repeat Finder (SRF), a novel algorithm that reconstructs satellite repeat units and high-order repeats from sequence data, aiding genome annotation and evolutionary studies of satellite DNA.
Contribution
SRF is the first method capable of reconstructing satellite repeats and HORs without prior knowledge, even from unassembled or incomplete genome data.
Findings
SRF successfully reconstructs known satellite repeats in human and model organisms.
Satellite repeats can comprise up to 12% of genomes across species.
Many satellite repeats are underrepresented in current genome assemblies.
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChromosomal and Genetic Variations · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
