# Current status of human endogenous retrovirus annotation

**Authors:** Sergei Sinitsyn, Marharyta Klianitskaya, Michelle Vincendeau, Jan Pačes, Dmitrij Frishman

PMC · DOI: 10.1093/bib/bbag062 · 2026-02-16

## TL;DR

This paper compares three databases for human endogenous retroviruses and finds inconsistencies in their annotations, suggesting the need for unified standards.

## Contribution

The study provides a detailed comparative analysis of HERV annotation resources and proposes recommendations for harmonized annotation standards.

## Key findings

- HERV annotation databases show significant discrepancies in element counts and genome coverage.
- Up to 93% of HERV records can be reconciled using refined matching criteria.
- Each database contributes unique elements, highlighting their complementary strengths.

## Abstract

Human endogenous retroviruses (HERVs) constitute a significant fraction of the human genome and are increasingly recognized for their roles in both physiological and pathological processes. Despite their biological importance, the annotation of HERV elements remains inconsistent across major public databases. In this study, we present a comprehensive comparative analysis of three key HERV annotation resources: DFAM, Human Endogenous Retroviruses Database (HERVd), and RepBase. We systematically examine their content, classification schemes, and postprocessing workflows and assess the concordance of their annotations based on genomic coordinates. Our analysis reveals substantial discrepancies in element counts, genome coverage, and repeat fragmentation strategies, which we trace back to differences in curation methodologies—ranging from DFAM’s hidden Markov model-based automated detection to HERVd’s semimanual defragmentation. Using refined matching criteria, we demonstrate that up to 93% of HERV records can be reconciled across databases, yet each source still contributes a substantial proportion of unique elements. We highlight the complementary strengths of these resources and provide practical recommendations for their usage in HERV research. Our findings underscore the need for harmonized standards in retroelement annotation and may inform future efforts toward unified and comprehensive HERV cataloging, particularly in light of emerging genome assemblies such as T2T-CHM13.

## Full-text entities

- **Genes:** GFER (growth factor, augmenter of liver regeneration) [NCBI Gene 2671] {aka ALR, ERV1, HERV1, HPO, HPO1, HPO2}, ERV3-1 (endogenous retrovirus group 3 member 1, envelope) [NCBI Gene 2086] {aka ERV-R, ERV3, ERVR, HERV-R, HERVR, envR}, INSM2 (INSM transcriptional repressor 2) [NCBI Gene 84684] {aka IA-6, IA6, mlt1}, FZD4 (frizzled class receptor 4) [NCBI Gene 8322] {aka CD344, EVR1, FEVR, FZD4S, Fz-4, Fz4}
- **Diseases:** ERVs (MESH:D003866), neurological diseases (MESH:D020271), infectious diseases (MESH:D003141), inflammatory and autoimmune disorders (MESH:D007249), cancer (MESH:D009369)
- **Chemicals:** DFAM (-)
- **Species:** Erysiphe sp. RV (species) [taxon 662690], Human endogenous retroviruses (clade) [taxon 206037], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** T2T, T2T

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12907019/full.md

---
Source: https://tomesphere.com/paper/PMC12907019