Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR   Transcript

Mukuntha Narayanan Sundararaman; Ayush Kumar; Jithendra Vepa

arXiv:2102.00804·eess.AS·June 17, 2021

Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa

PDF

1 Repo

TL;DR

PhonemeBERT is a novel joint language model that integrates phoneme sequences with ASR transcripts to improve robustness against errors in noisy and out-of-domain speech recognition data.

Contribution

This work introduces PhonemeBERT, a BERT-style model that learns phonetic-aware representations for ASR transcripts, enhancing downstream task performance especially in low-resource and noisy scenarios.

Findings

01

Outperforms state-of-the-art baselines on benchmark datasets

02

Improves robustness to ASR errors in noisy conditions

03

Effective in low-resource settings without phoneme data

Abstract

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the transcribed text. These errors significantly degrade the performance of downstream tasks. In this work, we propose a BERT-style language model, referred to as PhonemeBERT, that learns a joint language model with phoneme sequence and ASR transcript to learn phonetic-aware representations that are robust to ASR errors. We show that PhonemeBERT can be used on downstream tasks using phoneme sequences as additional features, and also in low-resource setup where we only have ASR-transcripts for the downstream tasks with no phoneme information available. We evaluate our approach extensively by generating noisy data for three benchmark datasets - Stanford Sentiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Observeai-Research/Phoneme-BERT
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.