mask-Net: Learning Context Aware Invariant Features using Adversarial   Forgetting (Student Abstract)

Hemant Yadav; Atul Anshuman Singh; Rachit Mittal; Sunayana Sitaram; Yi; Yu; Rajiv Ratn Shah

arXiv:2011.12979·cs.SD·October 19, 2021·1 cites

mask-Net: Learning Context Aware Invariant Features using Adversarial Forgetting (Student Abstract)

Hemant Yadav, Atul Anshuman Singh, Rachit Mittal, Sunayana Sitaram, Yi, Yu, Rajiv Ratn Shah

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mask-Net, a novel adversarial forgetting approach to learn invariant features for speech-to-text systems, improving generalization and reducing errors across different datasets.

Contribution

The paper proposes a new adversarial forgetting method to induce invariance in features, enhancing robustness in speech recognition models.

Findings

01

Achieved 2.2% absolute WER improvement on out-of-distribution data.

02

Achieved 1.3% absolute WER improvement on in-distribution data.

03

Demonstrated better generalization compared to traditional models.

Abstract

Training a robust system, e.g.,Speech to Text (STT), requires large datasets. Variability present in the dataset such as unwanted nuisances and biases are the reason for the need of large datasets to learn general representations. In this work, we propose a novel approach to induce invariance using adversarial forgetting (AF). Our initial experiments on learning invariant features such as accent on the STT task achieve better generalizations in terms of word error rate (WER) compared to the traditional models. We observe an absolute improvement of 2.2% and 1.3% on out-of-distribution and in-distribution test sets, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raotnameh/Robust_ASR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Multimodal Machine Learning Applications