# CoV-UniBind: a unified antibody binding database for SARS-CoV-2

**Authors:** Aryan Bhasin, Francesco Saccon, Callum Canavan, Andrew Robson, Joao Euko, Alexandra C Walls, Yunguan Fu

PMC · DOI: 10.1093/bioadv/vbaf328 · Bioinformatics Advances · 2026-01-08

## TL;DR

CoV-UniBind is a comprehensive database of SARS-CoV-2 antibody interactions, designed to support machine learning and vaccine development efforts.

## Contribution

The novel contribution is a unified and standardized database integrating over 75,000 antibody–antigen entries for SARS-CoV-2.

## Key findings

- CoV-UniBind integrates data from three public sources and peer-reviewed publications into a single standardized database.
- The database was used to benchmark multiple machine learning models for antibody design and vaccine development tasks.
- Folded structures and model scores are made publicly available for further research.

## Abstract

Since the emergence of SARS-CoV-2, numerous studies have investigated antibody interactions with viral variants in vitro, and several datasets have been curated to compile available protein structures and experimental measurements. However, existing data remain fragmented, limiting their utility for the development and validation of machine learning models for antibody–antigen interaction prediction. Here, we present CoV-UniBind, a unified database comprising over 75 000 entries of SARS-CoV-2 antibody–antigen sequence, binding, and structural data, integrated and standardized from three public sources and multiple peer-reviewed publications. To demonstrate its utility, we benchmarked multiple protein folding, inverse folding, and language models across tasks relevant to antibody design and vaccine development. We expect CoV-UniBind to facilitate future computational efforts in antibody and vaccine development against SARS-CoV-2.

The curated datasets, model scores and antibody synonyms are free to download at https://huggingface.co/datasets/InstaDeepAI/cov-unibind. Folded structures are available upon request.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** S (surface glycoprotein) [NCBI Gene 43740568] {aka spike glycoprotein}, ACE2 (angiotensin converting enzyme 2) [NCBI Gene 59272] {aka ACEH}, MLC1 (modulator of VRAC current 1) [NCBI Gene 23209] {aka LVM, MLC, VL}
- **Diseases:** COVID-19 (MESH:D000086382), DMS (MESH:D004401)
- **Chemicals:** FAMPNN (-), Water (MESH:D014867)
- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676], Coronaviridae (family) [taxon 11118], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Gammacoronavirus (genus) [taxon 694013]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12800777/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12800777/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12800777/full.md

---
Source: https://tomesphere.com/paper/PMC12800777