Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition
Prashant Bordea, Amarsinh Varpeb, Ramesh Manzac, Pravin Yannawara

TL;DR
This paper explores audio-visual speech recognition by combining Zernike moments for visual features and MFCC for audio, achieving improved recognition of isolated words, especially under noisy conditions.
Contribution
It introduces a novel combination of Zernike moments and MFCC features with PCA for recognizing isolated words in audio-visual speech recognition systems.
Findings
Audio-only recognition accuracy is 100%.
Visual-only recognition accuracy is 63.88%.
Feature reduction via PCA enhances recognition performance.
Abstract
Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In this paper we computed visual features using Zernike moments and audio feature using Mel Frequency Cepstral Coefficients (MFCC) on vVISWa (Visual Vocabulary of Independent Standard Words) dataset which contains collection of isolated set of city names of 10 speakers. The visual features were normalized and dimension of features set was reduced by Principal Component Analysis (PCA) in order to recognize the isolated word utterance on PCA space.The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Data Compression Techniques
MethodsPrincipal Components Analysis
