Assignment of endogenous retrovirus integration sites using a mixture model
David R. Hunter, Le Bao, Mary Poss

TL;DR
This paper introduces a new statistical mixture model to accurately detect endogenous retrovirus integration sites in genomes, improving upon existing methods by accounting for read count variability, and demonstrates its application in studying mule deer genetics.
Contribution
A novel two-component mixture of negative binomial distributions for assigning ERV presence, outperforming traditional threshold-based methods.
Findings
Effective in identifying ERV sites in mule deer without genomic resources
Revealed patterns of shared ERV sites related to animal relatedness
Improved accuracy over existing detection methods
Abstract
Structural variation occurs in the genomes of individuals because of the different positions occupied by repetitive genome elements like endogenous retroviruses, or ERVs. The presence or absence of ERVs can be determined by identifying the junction with the host genome using high-throughput sequence technology and a clustering algorithm. The resulting data give the number of sequence reads assigned to each ERV-host junction sequence for each sampled individual. Variability in the number of reads from an individual integration site makes it difficult to determine whether a site is present for low read counts. We present a novel two-component mixture of negative binomial distributions to model these counts and assign a probability that a given ERV is present in a given individual. We explain how our approach is superior to existing alternatives, including another form of two-component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChromosomal and Genetic Variations · Genomics and Phylogenetic Studies · Genetic diversity and population structure
