Automatic Speech Recognition for Hindi
Anish Saha, A.G. Ramakrishnan

TL;DR
This paper presents a web-based speech recognition system for Hindi that integrates real-time audio recording, voice activity detection, and a neural network for aligning speech signals with HMM states, enhancing ASR accuracy.
Contribution
It introduces a novel web application framework for Hindi ASR with real-time processing, collaborative correction, and a new backpropagation method for neural network alignment.
Findings
Effective real-time speech recognition for Hindi implemented.
Voice activity detection reduces unnecessary processing.
Novel backpropagation improves neural network alignment accuracy.
Abstract
Automatic speech recognition (ASR) is a key area in computational linguistics, focusing on developing technologies that enable computers to convert spoken language into text. This field combines linguistics and machine learning. ASR models, which map speech audio to transcripts through supervised learning, require handling real and unrestricted text. Text-to-speech systems directly work with real text, while ASR systems rely on language models trained on large text corpora. High-quality transcribed data is essential for training predictive models. The research involved two main components: developing a web application and designing a web interface for speech recognition. The web application, created with JavaScript and Node.js, manages large volumes of audio files and their transcriptions, facilitating collaborative human correction of ASR transcripts. It operates in real-time using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
