Multi-Language Identification Using Convolutional Recurrent Neural Network
Vrishabh Ajay Lakhani, Rohan Mahadev

TL;DR
This paper explores a novel convolutional recurrent neural network architecture with LSTM and GRU units for language identification, comparing spectrum-based features and polyphonic sound sequences for classifying English and Spanish.
Contribution
It introduces a new neural network approach combining convolutional and recurrent layers with LSTM and GRU units for improved language identification performance.
Findings
Polyphonic sound sequences outperform traditional MFCC features.
LSTM and GRU gating mechanisms enhance classification accuracy.
The proposed model achieves better results than unidirectional DNNs.
Abstract
Language Identification, being an important aspect of Automatic Speaker Recognition has had many changes and new approaches to ameliorate performance over the last decade. We compare the performance of using audio spectrum in the log scale and using Polyphonic sound sequences from raw audio samples to train the neural network and to classify speech as either English or Spanish. To achieve this, we use the novel approach of using a Convolutional Recurrent Neural Network using Long Short Term Memory (LSTM) or a Gated Recurrent Unit (GRU) for forward propagation of the neural network. Our hypothesis is that the performance of using polyphonic sound sequence as features and both LSTM and GRU as the gating mechanisms for the neural network outperform the traditional MFCC features using a unidirectional Deep Neural Network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Authorship Attribution and Profiling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
