Automatic Speech Recognition for Hindi

Anish Saha; A.G. Ramakrishnan

arXiv:2406.18135·cs.CL·June 27, 2024

Automatic Speech Recognition for Hindi

Anish Saha, A.G. Ramakrishnan

PDF

Open Access

TL;DR

This paper presents a web-based speech recognition system for Hindi that integrates real-time audio recording, voice activity detection, and a neural network for aligning speech signals with HMM states, enhancing ASR accuracy.

Contribution

It introduces a novel web application framework for Hindi ASR with real-time processing, collaborative correction, and a new backpropagation method for neural network alignment.

Findings

01

Effective real-time speech recognition for Hindi implemented.

02

Voice activity detection reduces unnecessary processing.

03

Novel backpropagation improves neural network alignment accuracy.

Abstract

Automatic speech recognition (ASR) is a key area in computational linguistics, focusing on developing technologies that enable computers to convert spoken language into text. This field combines linguistics and machine learning. ASR models, which map speech audio to transcripts through supervised learning, require handling real and unrestricted text. Text-to-speech systems directly work with real text, while ASR systems rely on language models trained on large text corpora. High-quality transcribed data is essential for training predictive models. The research involved two main components: developing a web application and designing a web interface for speech recognition. The web application, created with JavaScript and Node.js, manages large volumes of audio files and their transcriptions, facilitating collaborative human correction of ASR transcripts. It operates in real-time using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques