SpeechTaxi: On Multilingual Semantic Speech Classification
Lennart Keller, Goran Glava\v{s}

TL;DR
This paper compares end-to-end multilingual speech classifiers with cascading transcription-based methods for semantic speech classification, introducing SpeechTaxi dataset and proposing a Romanized text approach for better cross-lingual transfer.
Contribution
It introduces SpeechTaxi, a new multilingual dataset for semantic speech classification, and provides a comprehensive comparison of E2E and cascading methods, including a novel Romanized text approach for cross-lingual robustness.
Findings
E2E classifiers outperform cascading in monolingual settings.
E2E models have limited cross-lingual transfer capabilities.
Romanized text transcription offers a robust cross-lingual solution.
Abstract
Recent advancements in multilingual speech encoding as well as transcription raise the question of the most effective approach to semantic speech classification. Concretely, can (1) end-to-end (E2E) classifiers obtained by fine-tuning state-of-the-art multilingual speech encoders (MSEs) match or surpass the performance of (2) cascading (CA), where speech is first transcribed into text and classification is delegated to a text-based classifier. To answer this, we first construct SpeechTaxi, an 80-hour multilingual dataset for semantic speech classification of Bible verses, covering 28 diverse languages. We then leverage SpeechTaxi to conduct a wide range of experiments comparing E2E and CA in monolingual semantic speech classification as well as in cross-lingual transfer. We find that E2E based on MSEs outperforms CA in monolingual setups, i.e., when trained on in-language data. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
