# Enhancing Named Entity Recognition for immunology and immune-mediated disorders

**Authors:** Songyue Chen, Jinshan Che, Mingming Sun, Yuhong Wang

PMC · DOI: 10.3389/fimmu.2025.1613479 · Frontiers in Immunology · 2026-02-04

## TL;DR

This paper introduces a new framework for named entity recognition in immunology texts, improving accuracy through structured encoding and knowledge-guided decoding.

## Contribution

A domain-specific NER framework combining structured span encoding and ontology-aware decoding for immunology and immune-mediated disorders.

## Key findings

- The proposed model outperforms biomedical baselines like BioGPT and BioLinkBERT in F1-score on immunology datasets.
- Structured span encoding and constraint-based decoding enhance entity recognition in low-resource settings.
- The framework improves extraction of immune-related entities such as cytokines and genetic markers.

## Abstract

Named Entity Recognition (NER) in the biomedical domain, particularly within immunology and immune-mediated disorders, presents unique challenges due to the presence of complex, nested, and overlapping entities. Existing NER systems often struggle with the specialized terminologies and contextual ambiguity of immunological texts, which limits their effectiveness in downstream biomedical applications.

To address these challenges, we propose a domain-specific NERframework that integrates structured span encoding and knowledge-guided decoding. The framework is designed to enhance recognition accuracy under low-resource and weak supervision conditions by combining a hierarchical span encoder (SpanStructEncoder) with a constraint-based decoding strategy (Contextual Constraint Decoding, CCD). We evaluate our model on three immunology-specific datasets: the NCBI Disease Corpus (immune-related diseases), SNPPhenA (genetic variants and phenotype associations), and HLA-SPREAD (HLA-disease and drug-response relations). These datasets were selected because they represent key immunological concepts such as cytokines, immune cell types, and genetic markers that underlie immune responses and disease mechanisms.

Experimental results demonstrate that our model achieves consistent improvements in F1-score over strong biomedical baselines including BioGPT, BioLinkBERT, and SciFive. Our results confirm that incorporating structured span representations and ontology-aware decoding significantly improves entity extraction for immunology-related texts. The proposed framework provides a robust and interpretable solution for immunology-focused biomedical text mining, facilitating applications in literature curation, biomarker discovery, and clinical decision support.

## Full-text entities

- **Genes:** IL10 (interleukin 10) [NCBI Gene 3586] {aka CSIF, GVHDS, IL-10, IL10A, TGIF}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, IL2 (interleukin 2) [NCBI Gene 3558] {aka IL-2, TCGF, lymphokine}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}
- **Diseases:** inflammatory diseases (MESH:D007249), diseases (MESH:D004194), cancer (MESH:D009369), immune-mediated diseases (MESH:C567355), autoimmune conditions (MESH:D001327), lupus (MESH:D008180), psoriasis (MESH:D011565), rheumatoid arthritis (MESH:D001172), immune- (MESH:D007154), Crohn's disease (MESH:D003424), allergies (MESH:D004342), HLA (MESH:C538465)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12913371/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12913371/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12913371/full.md

---
Source: https://tomesphere.com/paper/PMC12913371