# Contrastive learning enables epitope overlap predictions for targeted antibody discovery

**Authors:** Clinton M. Holt, Alexis K. Janke, Parastoo Amlashi, Parker J. Jamieson, Toma M. Marinov, Ivelin S. Georgiev

PMC · DOI: 10.1016/j.patter.2025.101419 · Patterns · 2025-11-13

## TL;DR

The paper introduces machine learning methods to predict antibody binding sites, enabling faster discovery of therapeutic antibodies.

## Contribution

A contrastive learning framework for antibody language models that accurately predicts epitope overlap across diverse antibodies.

## Key findings

- Contrastive fine-tuning achieves 97% accuracy in predicting epitope overlap for SARS-CoV-2 antibodies.
- AbLang-PDB outperforms sequence-based methods with a 5-fold improvement in average precision.
- 70% of selected HIV-1 antibody candidates demonstrated specificity and 50% competed for binding.

## Abstract

Computational epitope prediction remains an unmet need for therapeutic antibody development. We present three complementary approaches for predicting epitope relationships from antibody sequences. First, by analyzing approximately 18 million antibody pairs targeting around 250 protein families, we establish that over 70% of heavy-chain complementarity-determining region 3 (CDRH3) sequence identity among antibodies sharing both V genes reliably predicts overlapping epitopes. Second, we develop a supervised contrastive fine-tuning framework for antibody large language models that enriches embeddings with epitope information. Applied to SARS-CoV-2 receptor-binding-domain antibodies, this approach achieves 97% total accuracy in predicting high levels of structural overlap. Third, we create AbLang-PDB, a generalized model achieving 5-fold improvement in average precision over sequence-based methods and correlating strongly with epitope overlap (ρ = 0.81). Experimental validation with HIV-1 antibody 8ANC195 shows that 70% of selected candidates demonstrate HIV-1 specificity and 50% compete for binding. These models provide powerful tools for epitope-targeted antibody discovery while demonstrating contrastive learning’s efficacy for encoding epitope information.

•Contrastive fine-tuning encodes epitope relationships into antibody LLM embeddings•AbLang-PDB identifies overlapping-epitope antibodies across 250 protein families•Both structural and binding data successfully teach epitope specificity•AbLang-PDB achieved a 50% hit rate, identifying 8ANC195 overlapping-epitope antibodies

Contrastive fine-tuning encodes epitope relationships into antibody LLM embeddings

AbLang-PDB identifies overlapping-epitope antibodies across 250 protein families

Both structural and binding data successfully teach epitope specificity

AbLang-PDB achieved a 50% hit rate, identifying 8ANC195 overlapping-epitope antibodies

Developing vaccines and antibody drugs requires identifying where antibodies bind to target proteins. Current computational methods face a fundamental trade-off: simple sequence comparisons are reliable when applicable but miss most promising candidates, while complex structural approaches have broader applicability but require significant computational resources and often produce inaccurate predictions. This bottleneck significantly slows therapeutic development, particularly for challenging targets such as rapidly mutating viruses. We developed a machine learning approach that teaches antibody language models to recognize when different antibodies will bind to overlapping antigen sites, even when the antibody sequences are significantly different. Our approach offers a practical solution for rapidly screening large antibody databases, potentially accelerating the discovery pipeline by identifying the most promising candidates before expensive laboratory validation.

Antibodies are important for human health, but their diversity makes computational prediction of their binding properties challenging. Through contrastive fine-tuning of antibody language models on millions of antibody pairs, the authors enable accurate prediction of epitope overlap even among sequence-diverse antibodies, providing powerful new tools for therapeutic antibody discovery.

## Full-text entities

- **Genes:** ITIH4 (inter-alpha-trypsin inhibitor heavy chain 4) [NCBI Gene 3700] {aka GP120, H4P, IHRP, ITI-HC4, ITIHL1, PK-120}, HisTrap [NCBI Gene 100187907], Env [NCBI Gene 155971], CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, BCR (BCR activator of RhoGEF and GTPase) [NCBI Gene 613] {aka ALL, BCR1, CML, D22S11, D22S662, PHL}
- **Diseases:** cancers (MESH:D009369), infectious diseases (MESH:D003141), autoimmune diseases (MESH:D001327)
- **Chemicals:** nickel (MESH:D009532), TMB (MESH:C021758), amino acid (MESH:D000596), carbon (MESH:D002244), 3,3',5,5' tetramethylbenzidine dihydrochloride (-), NaCl (MESH:D012965), poloxamer 188 (MESH:D020442), sodium phosphate (MESH:C018279), Tween 20 (MESH:D011136), SDS (MESH:D012967), HCl (MESH:D006851), PES (MESH:C022840), glycine HCL (MESH:D005998), sulfuric acid (MESH:C033158), water (MESH:D014867), imidazole (MESH:C029899), CO2 (MESH:D002245), L-glutamine (MESH:D005973), methyl-alpha-D-mannopyranoside (MESH:C008466), agarose (MESH:D012685)
- **Species:** Human respirovirus 3 (no rank) [taxon 11216], Homo sapiens (human, species) [taxon 9606], Respiratory syncytial virus (no rank) [taxon 12814], hepatitis C virus [taxon 11103], Human immunodeficiency virus 1 (no rank) [taxon 11676], Gammacoronavirus (genus) [taxon 694013], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]
- **Mutations:** A501C, I201C, I559P, A433C, T605C, N332T, serine-glycine
- **Cell lines:** Expi293F — Homo sapiens (Human), Transformed cell line (CVCL_D615), ESM-2 — Homo sapiens (Human), Transformed cell line (CVCL_XI05)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12921510/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12921510/full.md

## References

78 references — full list in the complete paper: https://tomesphere.com/paper/PMC12921510/full.md

---
Source: https://tomesphere.com/paper/PMC12921510