TL;DR
LingVarBench is a new benchmark and synthetic data pipeline for improving entity recognition in noisy, real-world phone call transcripts, enabling more robust and cost-effective language model performance.
Contribution
It introduces LingVarBench, a novel benchmark and data generation pipeline that enhances LLM robustness in noisy, real-world transcripts for entity extraction tasks.
Findings
Prompts optimized on LingVarBench outperform zero-shot baselines.
Achieves F1 scores of approximately 94-95% on real transcripts.
Substantially improves performance on subjective questionnaire items.
Abstract
We study structured entity extraction from phone-call transcripts in customer-support and healthcare settings, where annotation is costly, and data access is limited by privacy and consent. Existing methods degrade under disfluencies, interruptions, and speaker overlap, yet large real-call corpora are rarely shareable. We introduce LingVarBench, a benchmark and semantic synthetic data generation pipeline that generates linguistically varied training data via (1) LLM-sampled entity values, (2) curated linguistic verbalization patterns covering diverse disfluencies and entity-specific readout styles, and (3) a value-transcript consistency filter. Using this dataset, DSPy's SIMBA automatically synthesizes and optimizes extraction prompts, reducing manual prompt engineering and targeting robustness to verbal variation. On real customer transcripts, prompts optimized solely on LingVarBench…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
