A Deep Dive into the Disparity of Word Error Rates Across Thousands of   NPTEL MOOC Videos

Anand Kumar Rai; Siddharth D Jaiswal; Animesh Mukherjee

arXiv:2307.10587·cs.CL·July 21, 2023·1 cites

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

PDF

Open Access 1 Repo

TL;DR

This study analyzes the performance disparities of state-of-the-art ASR systems across diverse Indian demographics using a large dataset of NPTEL MOOC videos, highlighting the need for more inclusive speech recognition models.

Contribution

The paper introduces a large, diverse speech dataset from NPTEL MOOCs and evaluates ASR disparities across demographic and disciplinary traits, revealing significant biases.

Findings

01

Disparities exist based on gender, native region, age, and speech rate.

02

No disparity was found based on caste.

03

Significant disparity observed across different lecture disciplines.

Abstract

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim 9.8$ K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raianand1991/tie
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Multimodal Machine Learning Applications · Speech and dialogue systems