AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions
Hirofumi Tsuruta, Hiroyuki Yamazaki, Ryota Maeda, Ryotaro Tamura,, Jennifer N. Wei, Zelda Mariet, Poomarin Phloyphisut, Hidetoshi Shimokawa,, Joseph R. Ledsam, Lucy Colwell, Akihiro Imura

TL;DR
This paper introduces AVIDa-hIL6, a large-scale, high-quality dataset of antigen-VHH pairs from an immunized alpaca, designed to improve machine learning models for predicting antigen-antibody interactions, especially considering mutations.
Contribution
The creation of AVIDa-hIL6, a comprehensive dataset with over half a million labeled pairs, including mutants, to advance computational prediction of antibody-antigen interactions.
Findings
Existing models show potential but need improvement for mutant prediction.
The dataset enables development of more accurate machine learning models.
Benchmark results highlight current model limitations.
Abstract
Antibodies have become an important class of therapeutic agents to treat human diseases. To accelerate therapeutic antibody discovery, computational methods, especially machine learning, have attracted considerable interest for predicting specific interactions between antibody candidates and target antigens such as viruses and bacteria. However, the publicly available datasets in existing works have notable limitations, such as small sizes and the lack of non-binding samples and exact amino acid sequences. To overcome these limitations, we have developed AVIDa-hIL6, a large-scale dataset for predicting antigen-antibody interactions in the variable domain of heavy chain of heavy chain antibodies (VHHs), produced from an alpaca immunized with the human interleukin-6 (IL-6) protein, as antigens. By leveraging the simple structure of VHHs, which facilitates identification of full-length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMonoclonal and Polyclonal Antibodies Research · Glycosylation and Glycoproteins Research · vaccines and immunoinformatics approaches
