# Patient address parsing via KG-aware contrastive learning and constrained on-prem LLM inference

**Authors:** Jinzhe Li, Xin Pan, Yanchao Jia

PMC · DOI: 10.1038/s41598-026-39348-z · Scientific Reports · 2026-02-09

## TL;DR

This paper introduces AddrKG-LLM, a framework that improves address parsing by combining knowledge graphs with controlled large language model decoding.

## Contribution

The novelty lies in integrating KG-aware contrastive learning with constrained on-prem LLM inference for accurate and compliant address parsing.

## Key findings

- AddrKG-LLM achieves higher micro- and macro-level accuracy compared to existing methods.
- The framework maintains high Recall@K while reducing hallucination and ensuring policy compliance.
- Multi-view graph aggregation and hierarchy-aware contrastive learning enhance embedding alignment and parsing performance.

## Abstract

Address parsing seeks to map noisy, abbreviated free-text addresses into standardized hierarchical tuples for large-scale information systems. Existing approaches struggle with semantic and structural ambiguity, hallucination from unconstrained generation, and deployment constraints under privacy and governance requirements. We present AddrKG-LLM, a two-stage framework that combines knowledge-graph (KG)–aware retrieval with schema-restricted large language model (LLM) decoding. First, contrastive learning over multi-view administrative graphs yields node embeddings that retrieve and re-rank a compact Top-K candidate set, thereby bounding the search space while preserving high gold coverage (Recall@K). Second, a candidate-restricted decoder running on-premises produces JSON-compliant outputs, enforcing single-candidate field consistency and alignment with KG priors to improve controllability and policy compliance. Using de-identified real-world records, we evaluate structural consistency via micro-level accuracy (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$A_{micro}$$\end{document}) and macro-level accuracy (\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$A_{macro}$$\end{document}), and assess system properties with Recall@K and latency. Across strong string-matching, sequence-labeling, and generic LLM baselines, AddrKG-LLM delivers consistent gains in \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$A_{micro}$$\end{document} and \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$A_{macro}$$\end{document} with a favorable Recall@K. The proposed method consists of three components: (i) multi-view graph aggregation, (ii) a hierarchy-aware self-supervised contrastive objective that derives positives/negatives from administrative relations to align textual and graph embeddings, and (iii) candidate-restricted decoding within the KG-derived Top-K set. Overall, coupling KG-aware retrieval with constrained on-prem LLM decoding yields an accurate, controllable, and deployable solution for noisy-address structuring across domains.

## Full-text entities

- **Diseases:** hallucination (MESH:D006212)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12957398/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12957398/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/PMC12957398/full.md

---
Source: https://tomesphere.com/paper/PMC12957398