Enhancing ASR Performance through OCR Word Frequency Analysis: Theoretical Foundations
Kyudan Jung, Nam-Joon Kim, Hyun Gon Ryu, Hyuk-Jae Lee

TL;DR
This paper explores how analyzing OCR word frequencies, grounded in power law theory, can enhance automatic speech recognition accuracy for specialized terminology, especially in lecture settings.
Contribution
It introduces a theoretical foundation based on power law for the word frequency difference method to improve ASR performance on specialized terms.
Findings
The power law effectively models word frequency differences.
The approach improves ASR accuracy for specialized terminology.
Experimental results support the theoretical foundation.
Abstract
As the interest in large language models grows, the importance of accuracy in automatic speech recognition has become more pronounced. This is especially true for lectures that include specialized terminology. In such cases, the success rate of traditional ASR models tends to be low, presenting a significant challenge. A method using the word frequency difference approach has been proposed to improve ASR performance for specialized terminology. We investigated this proposal through experiments and data analysis to determine if it effectively addresses the issue. In addition, we introduced the power law as the theoretical foundation for the relative frequency methodology mentioned in this approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation and Modeling Applications · Speech Recognition and Synthesis · Internet of Things and Social Network Interactions
