Improving Speech Recognition of Named Entities in Classroom Speech with LLM Revision and Phonetic-Semantic Context
Viet Anh Trinh, Xinlu He, and Jacob Whitehill

TL;DR
This paper presents a novel LLM-based revision pipeline that improves the recognition of named entities in classroom speech by leveraging phonetic and semantic context, reducing errors significantly.
Contribution
The authors introduce an LLM revision method combined with a new dataset to enhance NE recognition in ASR, achieving substantial WER reduction.
Findings
Up to 30% relative WER reduction for named entities.
Introduction of the NER-MIT-OpenCourseWare dataset with 45 hours of classroom data.
Effective use of phonetic and semantic context in LLM revision pipeline.
Abstract
Classroom speech and lectures often contain named entities (NEs) such as names of people and special terminology. While automatic speech recognition (ASR) systems have achieved remarkable performance on general speech, the word error rate (WER) of state-of-the-art ASR remains high for named entities. Since NE are often the most critical keywords, misrecognizing them can affect all downstream applications, especially when the ASR functions as the front end of a complex system. In this paper, we introduce a large language model (LLM) revision pipeline to revise incorrect NEs in ASR predictions by leveraging not only the LLM's world knowledge and reasoning ability but also the available phonetic and semantic context. We also introduce the NER-MIT-OpenCourseWare dataset, containing 45 hours of data from MIT courses for development and testing. On this dataset, our proposed technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
