# Building a neural network model to define DNA sequence specificity in V(D)J recombination

**Authors:** Justin C Harris, Jennifer N Byrum, Cooper B McKinney, Victoria Fairchild, Dee H Wu, Andrew H Fagg, Karla K Rodgers

PMC · DOI: 10.1093/nar/gkaf551 · 2025-06-23

## TL;DR

This paper uses a neural network to understand how DNA sequences influence V(D)J recombination efficiency in lymphocyte development.

## Contribution

The study introduces a novel neural network model and SHAP-based interpretation to uncover sequence-specific effects on V(D)J recombination.

## Key findings

- Nucleotides at specific positions in the heptamer of RSSs synergistically affect recombination efficiency.
- Interdependent effects between heptamer and nonamer regions influence recombination outcomes.
- A nonamer-informed model reveals how different nonamer substrates impact recombination efficiency.

## Abstract

In developing lymphocytes, V(D)J recombination assembles functional antigen receptor (AgR) genes through rearrangement of the AgR loci to adjoin component gene segments. Each candidate gene segment for recombination is flanked by a recombination signal sequence (RSS), composed of heptamer and nonamer motifs separated by 12 or 23 base pairs. To initiate V(D)J recombination, the recombination activating proteins RAG1 and RAG2 create DNA double-stranded breaks between a 12/23-RSS pair and their adjoining gene segments. The basis for selection of individual RSSs during each V(D)J recombination event is not well understood due, in part, to the wide-spread distribution of the semi-conserved RSSs across the AgR loci. Using publicly-available data for V(D)J recombination efficiencies on randomized 12-RSSs, we first built a neural network model that delineates how changes in sequence at certain positions in the RSS affects recombination efficiency. Second, to interpret the model’s decision-making process, we repurposed the game theoretic SHapley Additive exPlanations (SHAP) approach, with the results illustrating how nucleotides at pairwise positions in the heptamer provide synergistic contributions to recombination efficiency. Third, we trained a nonamer-informed neural network model with varied nonamer RSS substrates, and subsequently identified interdependent effects between the heptamer and nonamer regions on recombination efficiency.

Graphical Abstract

## Linked entities

- **Genes:** RAG1 (recombination activating 1) [NCBI Gene 5896], RAG2 (recombination activating 2) [NCBI Gene 5897]

## Full-text entities

- **Genes:** RAG1 (recombination activating 1) [NCBI Gene 5896] {aka RAG-1, RNF74}, RAG2 (recombination activating 2) [NCBI Gene 5897] {aka RAG-2}

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12205992/full.md

---
Source: https://tomesphere.com/paper/PMC12205992