Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance
Sourya Dipta Das, Yash Vadi, Abhishek Unnam, Kuldeep Yadav

TL;DR
This paper introduces an unsupervised Mahalanobis distance-based method leveraging latent embeddings from wav2vec 2.0 to effectively detect out-of-distribution dialect samples, enhancing dialect classification robustness.
Contribution
It presents a novel unsupervised approach using Mahalanobis distance on multi-layer embeddings for OOD detection in dialect classification, outperforming existing methods.
Findings
Outperforms state-of-the-art OOD detection methods significantly.
Utilizes latent embeddings from all intermediate layers of wav2vec 2.0.
Effective in real-world dialect classification scenarios.
Abstract
Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Authorship Attribution and Profiling
