# Deep-learning prediction of gene expression from personal genomes

**Authors:** Shiron Drusinsky, Sean Whalen, Katherine S. Pollard

PMC · DOI: 10.1186/s13059-025-03926-7 · Genome Biology · 2026-01-06

## TL;DR

This paper introduces Variformer, a deep-learning model that improves gene expression predictions from personal genomes by fine-tuning with individual genetic data and tissue-specific expression.

## Contribution

Variformer is a novel deep-learning model that uses personal genome data and tissue-specific gene expression to improve prediction accuracy across individuals.

## Key findings

- Variformer achieves expression prediction accuracy close to the cis-heritability of most genes.
- The model prioritizes genetic variants enriched for motif disruption and functional annotations.
- Variformer fails to generalize to unseen genes, indicating limitations in learning regulatory grammar.

## Abstract

Models that predict gene expression levels from DNA sequence struggle to predict differences between individuals when given their personal genome sequences. These models are generally trained on reference genome sequences, and thus have never observed examples of genetic variation at any locus during training, which may explain their lack of generalizability to personal genome sequences that do contain variation.

We utilize fine-tuning with personal genomes and matched tissue-specific gene expression values to develop Variformer, a deep sequence-based neural network. Across held-out people, Variformer predicts expression with accuracy that approaches the cis-heritability of most genes and prioritizes genetic variants across the allele frequency spectrum that are enriched for motif disruption and other functional annotations. We highlight how Variformer fails to generalize to unseen genes.

Our work suggests that fine-tuning with personal genomes corrects previously reported shortcomings of gene expression prediction across unseen individuals, but does not learn a regulatory grammar that generalizes to unseen loci. Fine-tuned deep expression models thus share similar performance and limitations of state-of-the-art linear models, highlighting a gap for the field.

The online version contains supplementary material available at 10.1186/s13059-025-03926-7.

## Full-text entities

- **Genes:** HOPX (HOP homeobox) [NCBI Gene 84525] {aka CAMEO, HOD, HOP, LAGY, NECC1, OB1}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, BTNL9 (butyrophilin like 9) [NCBI Gene 153579] {aka BTN3, BTN8, VDLS1900}, BTNL3 (butyrophilin like 3) [NCBI Gene 10917] {aka BTN9.1, BTNLR}, LGR5 (leucine rich repeat containing G protein-coupled receptor 5) [NCBI Gene 8549] {aka FEX, GPR49, GPR67, GRP49, HG38}, PPIF (peptidylprolyl isomerase F) [NCBI Gene 10105] {aka CYP3, CyP-M, Cyp-D, CypD}, BTNL8 (butyrophilin like 8) [NCBI Gene 79908] {aka BTN9.2}, NFKB2 (nuclear factor kappa B subunit 2) [NCBI Gene 4791] {aka CVID10, H2TF1, LYT-10, LYT10, NF-kB2, p100}, RELA (RELA proto-oncogene, NF-kB subunit) [NCBI Gene 5970] {aka AIF3BL3, CMCU, NFKB3, p65}, ZFP57 (ZFP57 zinc finger protein) [NCBI Gene 346171] {aka C6orf40, TNDM1, ZNF698, bA145L22, bA145L22.2}
- **Diseases:** HSV (MESH:D008228), PCC (MESH:C536353), HOGP (MESH:C000719191)
- **Chemicals:** Borzoi (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs1413354854, rs3117299
- **Cell lines:** THP-1 — Homo sapiens (Human), Childhood acute monocytic leukemia, Cancer cell line (CVCL_0006)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869966/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12869966/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869966/full.md

---
Source: https://tomesphere.com/paper/PMC12869966