A System for Name and Address Parsing with Large Language Models
Adeeba Tarannum, Muzakkiruddin Ahmed Mohammed, Mert Can Cakmak, Shames Al Mandalawi, John Talburt

TL;DR
This paper presents a prompt-driven, validation-centered framework using large language models to reliably convert unstructured address and person data into a structured format, outperforming traditional methods especially in noisy or multilingual scenarios.
Contribution
It introduces a novel, reproducible approach that combines structured prompting, normalization, constrained decoding, and strict validation without fine-tuning, enhancing robustness and interpretability.
Findings
High field-level accuracy on real-world data
Strong schema adherence and calibration
Robust performance under noisy and multilingual conditions
Abstract
Reliable transformation of unstructured person and address text into structured data remains a key challenge in large-scale information systems. Traditional rule-based and probabilistic approaches perform well on clean inputs but fail under noisy or multilingual conditions, while neural and large language models (LLMs) often lack deterministic control and reproducibility. This paper introduces a prompt-driven, validation-centered framework that converts free-text records into a consistent 17-field schema without fine-tuning. The method integrates input normalisation, structured prompting, constrained decoding, and strict rule-based validation under fixed experimental settings to ensure reproducibility. Evaluations on heterogeneous real-world address data show high field-level accuracy, strong schema adherence, and stable confidence calibration. The results demonstrate that combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques
