# ParaDeep: sequence-based deep learning for residue-level paratope prediction using chain-aware BiLSTM-CNN models

**Authors:** Piyachat Udomwong, Thanathat Pamonsupornwichit, Kanchanok Kodchakorn, Chatchai Tayapiwatana

PMC · DOI: 10.3389/fbinf.2025.1684042 · 2025-11-05

## TL;DR

ParaDeep is a deep learning tool that predicts antibody paratopes from amino acid sequences, offering high accuracy and efficiency without needing 3D structures.

## Contribution

Introduces ParaDeep, a chain-aware BiLSTM-CNN model for residue-level paratope prediction with improved performance over existing methods.

## Key findings

- Heavy chain models outperformed light chain models in cross-validation (F1 = 0.856 vs. 0.774).
- ParaDeep achieved a 27% MCC improvement over the baseline Parapred on heavy chains.
- Heavy chains provide stronger sequence-based predictive signals compared to light chains.

## Abstract

Accurate prediction of antibody paratopes is a critical challenge in structure-limited, high-throughput discovery workflows. We present ParaDeep, a lightweight and interpretable deep learning framework for residue-level paratope prediction directly from amino acid sequences. ParaDeep integrates bidirectional long short-term memory networks with one-dimensional convolutional layers to capture both long-range sequence context and local binding motifs. We systematically evaluated 30 model configurations varying in encoding schemes, convolutional kernel sizes, and antibody chain types. In five-fold cross-validation, heavy (H) chain models achieved the highest performance (F1 = 0.856 ± 0.014, MCC = 0.842 ± 0.015), outperforming light (L) chain models (F1 = 0.774 ± 0.023, MCC = 0.772 ± 0.022). On an independent blind test set, ParaDeep attained F1 = 0.723 and MCC = 0.685 for H chains, and F1 = 0.607 and MCC = 0.587 for L chains, representing a 27% MCC improvement over the sequence-based baseline Parapred. Chain-specific modeling revealed that heavy chains provide stronger sequence-based predictive signals, while light chains benefit more from structural context. ParaDeep approaches the performance of state-of-the-art structure-based methods on heavy chains while requiring only sequence input, enabling faster and broader applicability without the computational cost of 3D modeling. Its efficiency and scalability make it well-suited for early-stage antibody discovery, repertoire profiling, and therapeutic design, particularly in the absence of structural data. The implementation is freely available at https://github.com/PiyachatU/ParaDeep, with Python (PyTorch) code and a Google Colab interface for ease of use.

## Full-text entities

- **Genes:** LIPC (lipase C, hepatic type) [NCBI Gene 3990] {aka HDLCQ12, HL, HTGL}, MLC1 (modulator of VRAC current 1) [NCBI Gene 23209] {aka LVM, MLC, VL}
- **Diseases:** BiLSTM (MESH:D000088562), HL (MESH:D008595)
- **Chemicals:** Parapred (-), acids (MESH:D000143), Amino acid (MESH:D000596)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12626946/full.md

---
Source: https://tomesphere.com/paper/PMC12626946