Large scale paired antibody language models
Henry Kenlay, Fr\'ed\'eric A. Dreyer, Aleksandr Kovaltsuk, Dom Miketa,, Douglas Pires, Charlotte M. Deane

TL;DR
This paper introduces IgBert and IgT5, advanced antibody-specific language models trained on over two billion sequences, significantly improving antibody design and engineering capabilities for therapeutics.
Contribution
Development of the first large-scale antibody language models, IgBert and IgT5, trained on extensive datasets, outperforming existing models in antibody engineering tasks.
Findings
Models outperform existing antibody and protein language models.
Trained on over two billion sequences from the Observed Antibody Space dataset.
Enable more effective antibody design for therapeutics.
Abstract
Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity and affinity, and constitute the most successful class of biotherapeutics. With the advent of next-generation sequencing, billions of antibody sequences have been collected in recent years, though their application in the design of better therapeutics has been constrained by the sheer volume and complexity of the data. To address this challenge, we present IgBert and IgT5, the best performing antibody-specific language models developed to date which can consistently handle both paired and unpaired variable region sequences as input. These models are trained comprehensively using the more than two billion unpaired sequences and two million paired sequences of light and heavy chains present in the Observed Antibody Space dataset. We show that our models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMonoclonal and Polyclonal Antibodies Research · Glycosylation and Glycoproteins Research · Chemical Synthesis and Analysis
