Prodorshok I: A Bengali Isolated Speech Dataset for Voice-Based   Assistive Technologies - A comparative analysis of the effects of data   augmentation on HMM-GMM and DNN classifiers

Mohi Reza; Warida Rashid; Moin Mostakim

arXiv:1712.03579·cs.SD·December 12, 2017

Prodorshok I: A Bengali Isolated Speech Dataset for Voice-Based Assistive Technologies - A comparative analysis of the effects of data augmentation on HMM-GMM and DNN classifiers

Mohi Reza, Warida Rashid, Moin Mostakim

PDF

TL;DR

This paper introduces Prodorshok I, a Bengali isolated speech dataset, and analyzes how simple data augmentation techniques improve the accuracy of HMM-GMM and DNN-based speech recognition systems.

Contribution

It provides the first detailed analysis of data augmentation effects on Bengali speech recognition using HMM-GMM and DNN classifiers.

Findings

01

Data augmentation with small pitch shifts improves recognition accuracy.

02

HMM-GMM and DNN classifiers benefit from data augmentation.

03

Prodorshok I dataset supports development of Bengali voice-based assistive tech.

Abstract

Prodorshok I is a Bengali isolated word dataset tailored to help create speaker-independent, voice-command driven automated speech recognition (ASR) based assistive technologies to help improve human-computer interaction (HCI). This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.