LingVarBench: Benchmarking LLMs on Entity Recognitions and Linguistic Verbalization Patterns in Phone-Call Transcripts

Seyedali Mohammadi; Manas Paldhe; Amit Chhabra; Youngseo Son; Vishal Seshagiri

arXiv:2508.15801·cs.CL·January 15, 2026

LingVarBench: Benchmarking LLMs on Entity Recognitions and Linguistic Verbalization Patterns in Phone-Call Transcripts

Seyedali Mohammadi, Manas Paldhe, Amit Chhabra, Youngseo Son, Vishal Seshagiri

PDF

1 Video

TL;DR

LingVarBench is a new benchmark and synthetic data pipeline for improving entity recognition in noisy, real-world phone call transcripts, enabling more robust and cost-effective language model performance.

Contribution

It introduces LingVarBench, a novel benchmark and data generation pipeline that enhances LLM robustness in noisy, real-world transcripts for entity extraction tasks.

Findings

01

Prompts optimized on LingVarBench outperform zero-shot baselines.

02

Achieves F1 scores of approximately 94-95% on real transcripts.

03

Substantially improves performance on subjective questionnaire items.

Abstract

We study structured entity extraction from phone-call transcripts in customer-support and healthcare settings, where annotation is costly, and data access is limited by privacy and consent. Existing methods degrade under disfluencies, interruptions, and speaker overlap, yet large real-call corpora are rarely shareable. We introduce LingVarBench, a benchmark and semantic synthetic data generation pipeline that generates linguistically varied training data via (1) LLM-sampled entity values, (2) curated linguistic verbalization patterns covering diverse disfluencies and entity-specific readout styles, and (3) a value-transcript consistency filter. Using this dataset, DSPy's SIMBA automatically synthesizes and optimizes extraction prompts, reducing manual prompt engineering and targeting robustness to verbal variation. On real customer transcripts, prompts optimized solely on LingVarBench…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LingVarBench: Benchmarking LLMs on Entity Recognitions and Linguistic Verbalization Patterns in Phone-Call Transcripts· underline