Current status of human endogenous retrovirus annotation
Sergei Sinitsyn, Marharyta Klianitskaya, Michelle Vincendeau, Jan Pačes, Dmitrij Frishman

TL;DR
This paper compares three databases for human endogenous retroviruses and finds inconsistencies in their annotations, suggesting the need for unified standards.
Contribution
The study provides a detailed comparative analysis of HERV annotation resources and proposes recommendations for harmonized annotation standards.
Findings
HERV annotation databases show significant discrepancies in element counts and genome coverage.
Up to 93% of HERV records can be reconciled using refined matching criteria.
Each database contributes unique elements, highlighting their complementary strengths.
Abstract
Human endogenous retroviruses (HERVs) constitute a significant fraction of the human genome and are increasingly recognized for their roles in both physiological and pathological processes. Despite their biological importance, the annotation of HERV elements remains inconsistent across major public databases. In this study, we present a comprehensive comparative analysis of three key HERV annotation resources: DFAM, Human Endogenous Retroviruses Database (HERVd), and RepBase. We systematically examine their content, classification schemes, and postprocessing workflows and assess the concordance of their annotations based on genomic coordinates. Our analysis reveals substantial discrepancies in element counts, genome coverage, and repeat fragmentation strategies, which we trace back to differences in curation methodologies—ranging from DFAM’s hidden Markov model-based automated detection…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChromosomal and Genetic Variations · Genome Rearrangement Algorithms · Genetic Mapping and Diversity in Plants and Animals
