# Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data

**Authors:** Katharina Waury, Stefan Lelieveld, Sanne Abeln, Henk-Jan van den Ham, Claude Loverdo, Claude Loverdo, Claude Loverdo

PMC · DOI: 10.1371/journal.pcbi.1013057 · 2025-05-30

## TL;DR

This study compares traditional and new methods for grouping antibodies based on their sequence and structure to better understand immune responses.

## Contribution

The paper provides the first comprehensive comparison of structure-based antibody clustering methods on realistic repertoire data.

## Key findings

- Structure-based methods group more antibodies together than sequence-based clonotyping.
- SPACE2 requires same-length CDR regions, limiting its applicability.
- Structure-based clustering shows promise but still faces challenges like missing structure templates.

## Abstract

Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies. Recent advances have made structure-based methods feasible on a repertoire level. However, so far, their performance has only been evaluated on single-antigen sets of antibodies. A comprehensive comparison of the benefits and limitations of structure-based tools on realistic and diverse repertoire data is missing. Here, we aim to explore the promise of structure-based clustering algorithms to replace or augment the standard sequence-based approach, specifically by identifying low-sequence identity groups. Two methods, SAAB+ and SPACE2, are evaluated against clonotyping. We curated a dataset of well-annotated pairs of antibodies that show high overlap in epitope residues and thus bind the same region within their respective antigen. This set of antibodies was introduced into a simulated repertoire to compare the performance of clustering approaches on a diverse antibody set. Our analysis reveals that structure-based methods do group more antibodies together compared to clonotyping. However, it also highlights the limitations associated with the need for same-length CDR regions by SPACE2. This work thoroughly compares the utility of different clustering methods and provides insights into what further steps are required to effectively use antibody structural information to group immune repertoire data.

Understanding our adaptive immune system response is crucial for developing vaccines and therapies. Grouping antibodies based on their function helps us understand the diversity of immune cells and allows us to find valuable therapeutic antibody candidates faster. In this study, we compare different methods for grouping antibodies, i.e., clustering approaches. The traditional method, known as clonotyping, groups antibodies based on sequence similarity and identical gene usage. However, this approach may miss functionally related antibodies that do not have a similar sequence and derive from differing genes. We explore the potential of structure-based clustering methods, which have become feasible on a large scale recently, to identify such antibodies. We compare two structure-based methods, SAAB+ and SPACE2, against clonotyping by creating a dataset of antibodies known to bind the same region of their target and integrating them into a simulated but diverse set of antibodies, called a repertoire. Our findings show that structure-based methods can form larger groups of related antibodies, but also face challenges such as missing structure templates and the requirement for same-length CDR regions. We highlight the promise and the current hurdles in using structural information to enhance antibody repertoire analysis.

## Full-text entities

- **Genes:** BCR (BCR activator of RhoGEF and GTPase) [NCBI Gene 613] {aka ALL, BCR1, CML, D22S11, D22S662, PHL}
- **Diseases:** SHM (MESH:D013001), infection (MESH:D007239)
- **Chemicals:** PCOMPBIOL-D-24-01065R1 (-)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Human immunodeficiency virus 1 (no rank) [taxon 11676], Human immunodeficiency virus (species) [taxon 12721], Homo sapiens (human, species) [taxon 9606]

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12148228/full.md

---
Source: https://tomesphere.com/paper/PMC12148228