My Science Tutor (MyST) -- A Large Corpus of Children's Conversational   Speech

Sameer S. Pradhan; Ronald A. Cole; Wayne H. Ward

arXiv:2309.13347·cs.CL·September 26, 2023·2 cites

My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech

Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward

PDF

Open Access

TL;DR

The paper introduces the MyST corpus, a large, publicly available collection of children's conversational speech from educational sessions, aimed at advancing speech recognition and conversational AI for educational purposes.

Contribution

The creation and release of one of the largest children's conversational speech corpora, with extensive transcriptions and broad accessibility for research and commercial use.

Findings

01

Approximately 400 hours of speech data collected

02

100K transcribed utterances available for research

03

Corpus licensed by multiple organizations for diverse applications

Abstract

This article describes the MyST corpus developed as part of the My Science Tutor project -- one of the largest collections of children's conversational speech comprising approximately 400 hours, spanning some 230K utterances across about 10.5K virtual tutor sessions by around 1.3K third, fourth and fifth grade students. 100K of all utterances have been transcribed thus far. The corpus is freely available (https://myst.cemantix.org) for non-commercial use using a creative commons license. It is also available for commercial use (https://boulderlearning.com/resources/myst-corpus/). To date, ten organizations have licensed the corpus for commercial use, and approximately 40 university and other not-for-profit research groups have downloaded the corpus. It is our hope that the corpus can be used to improve automatic speech recognition algorithms, build and evaluate conversational AI agents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems