Improving antibody language models with native pairing
Sarah M. Burbach, Bryan Briney

TL;DR
This paper demonstrates that training antibody language models with natively paired sequences significantly enhances their ability to learn immunologically relevant features across light and heavy chains, outperforming models trained on unpaired data.
Contribution
It introduces BALM models trained on natively paired antibody sequences, showing improved performance and cross-chain feature learning compared to unpaired training.
Findings
Training with natively paired data improves model performance.
Models learn immunologically relevant features across chains.
Fine-tuning ESM-2 with paired data captures similar cross-chain features.
Abstract
Current antibody language models are limited by their use of unpaired antibody sequence data and the biases in publicly available antibody sequence datasets, which are skewed toward antibodies against a relatively small number of pathogens. A recently published dataset (by Jaffe, et al) of approximately 1.6 x 10^6 natively paired human antibody sequences from healthy donors represents by far the largest dataset of its kind and offers a unique opportunity to evaluate how antibody language models can be improved by training with natively paired antibody sequence data. We trained two Baseline Antibody Language Models (BALM), using natively paired (BALM-paired) or unpaired (BALM-unpaired) sequences from the Jaffe dataset. We provide evidence that training with natively paired sequences substantially improves model performance and that this improvement results from the model learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · vaccines and immunoinformatics approaches · Monoclonal and Polyclonal Antibodies Research
