SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

Pelin Dogan-Sch\"onberger; Julian M\"ader; Thomas Hofmann

arXiv:2103.11401·cs.CL·March 23, 2021

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

Pelin Dogan-Sch\"onberger, Julian M\"ader, Thomas Hofmann

PDF

1 Models

TL;DR

This paper introduces SwissDial, the first annotated parallel corpus of spoken Swiss German dialects, enabling NLP research and applications for this dialect continuum, validated through speech synthesis experiments.

Contribution

It provides a novel, annotated, parallel spoken Swiss German corpus across multiple dialects, facilitating data-driven NLP methods for under-resourced dialects.

Findings

01

Corpus quality validated with neural speech synthesis models

02

First comprehensive dataset for Swiss German dialects

03

Enables future NLP research in Swiss German dialects

Abstract

Swiss German is a dialect continuum whose natively acquired dialects significantly differ from the formal variety of the language. These dialects are mostly used for verbal communication and do not have standard orthography. This has led to a lack of annotated datasets, rendering the use of many NLP methods infeasible. In this paper, we introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference. Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German. We present our data collection procedure in detail and validate the quality of our corpus by conducting experiments with the recent neural models for speech synthesis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
kuma-rtin/whisper_swissdial-spc
model· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.