# GAMMA: gap-aware motif mining under incomplete labeling with applications to MHC motifs

**Authors:** Xinyi Tang, Ran Liu

PMC · DOI: 10.1093/bioinformatics/btag014 · Bioinformatics · 2026-01-14

## TL;DR

This paper introduces GAMMA, a new method for identifying noncontiguous peptide binding motifs in MHC Class I molecules, revealing insights into immune recognition.

## Contribution

GAMMA is a novel probabilistic framework for gap-aware motif mining under incomplete labeling, outperforming existing tools in motif discovery.

## Key findings

- GAMMA accurately localizes binding residues and identifies motifs better than existing tools like GLAM2.
- The true number of binding residues may be eight, not nine as commonly assumed.
- Longer peptides show increased flexibility in the central region, consistent with structural observations.

## Abstract

Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to major histocompatibility complex (MHC) Class I molecules for antigen presentation to T cells. Traditionally, MHC Class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve nonadjacent residues, challenging the assumptions of existing methods.

In this study, we propose Gap-Aware Motif Mining Algorithm (GAMMA), a probabilistic framework designed to identify noncontiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with Markov chain Monte Carlo sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC Class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.

The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).

## Full-text entities

- **Genes:** HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}
- **Chemicals:** 1HHG (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12866627/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12866627/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12866627/full.md

---
Source: https://tomesphere.com/paper/PMC12866627