FT Speech: Danish Parliament Speech Corpus

Andreas Kirkedal; Marija Stepanovi\'c; Barbara Plank

arXiv:2005.12368·cs.CL·October 29, 2020

FT Speech: Danish Parliament Speech Corpus

Andreas Kirkedal, Marija Stepanovi\'c, Barbara Plank

PDF

1 Datasets

TL;DR

FT Speech is a comprehensive Danish Parliament speech corpus with over 1,800 hours of spontaneous speech, significantly enhancing resources for Danish ASR research and demonstrating strong transferability in speech recognition tasks.

Contribution

The paper introduces FT Speech, a large-scale, spontaneous speech corpus for Danish, and evaluates its effectiveness for automatic speech recognition training.

Findings

01

Achieved 14.01 WER on the new corpus

02

FT Speech transfers well to in-domain language data

03

Provides a valuable resource for Danish ASR research

Abstract

This paper introduces FT Speech, a new speech corpus created from the recorded meetings of the Danish Parliament, otherwise known as the Folketing (FT). The corpus contains over 1,800 hours of transcribed speech by a total of 434 speakers. It is significantly larger in duration, vocabulary, and amount of spontaneous speech than the existing public speech corpora for Danish, which are largely limited to read-aloud and dictation data. We outline design considerations, including the preprocessing methods and the alignment procedure. To evaluate the quality of the corpus, we train automatic speech recognition systems on the new resource and compare them to the systems trained on the Danish part of Spr\r{a}kbanken, the largest public ASR corpus for Danish to date. Our baseline results show that we achieve a 14.01 WER on the new corpus. A combination of FT Speech with in-domain language data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

alexandrainst/ftspeech
dataset· 950 dl
950 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.