Improving Speech Recognition of Named Entities in Classroom Speech with LLM Revision and Phonetic-Semantic Context

Viet Anh Trinh; Xinlu He; and Jacob Whitehill

arXiv:2506.10779·cs.CL·April 21, 2026

Improving Speech Recognition of Named Entities in Classroom Speech with LLM Revision and Phonetic-Semantic Context

Viet Anh Trinh, Xinlu He, and Jacob Whitehill

PDF

1 Datasets

TL;DR

This paper presents a novel LLM-based revision pipeline that improves the recognition of named entities in classroom speech by leveraging phonetic and semantic context, reducing errors significantly.

Contribution

The authors introduce an LLM revision method combined with a new dataset to enhance NE recognition in ASR, achieving substantial WER reduction.

Findings

01

Up to 30% relative WER reduction for named entities.

02

Introduction of the NER-MIT-OpenCourseWare dataset with 45 hours of classroom data.

03

Effective use of phonetic and semantic context in LLM revision pipeline.

Abstract

Classroom speech and lectures often contain named entities (NEs) such as names of people and special terminology. While automatic speech recognition (ASR) systems have achieved remarkable performance on general speech, the word error rate (WER) of state-of-the-art ASR remains high for named entities. Since NE are often the most critical keywords, misrecognizing them can affect all downstream applications, especially when the ASR functions as the front end of a complex system. In this paper, we introduce a large language model (LLM) revision pipeline to revise incorrect NEs in ASR predictions by leveraging not only the LLM's world knowledge and reasoning ability but also the available phonetic and semantic context. We also introduce the NER-MIT-OpenCourseWare dataset, containing 45 hours of data from MIT courses for development and testing. On this dataset, our proposed technique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

lucille0he/ocw
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.