A Multimodal Corpus of Expert Gaze and Behavior during Phonetic   Segmentation Tasks

Arif Khan; Ingmar Steiner; Yusuke Sugano; Andreas Bulling; Ross; Macdonald

arXiv:1712.04798·cs.HC·May 14, 2018·1 cites

A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks

Arif Khan, Ingmar Steiner, Yusuke Sugano, Andreas Bulling, Ross, Macdonald

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal corpus capturing expert gaze and behavior during phonetic segmentation tasks to improve automatic speech segmentation accuracy by modeling human manual segmentation processes.

Contribution

It provides a new multimodal dataset of expert behavior during phonetic segmentation, enabling better modeling of manual segmentation for automatic systems.

Findings

01

Corpus captures visual and auditory cues used by experts

02

Data highlights key features of manual segmentation process

03

Potential to improve automatic segmentation accuracy

Abstract

Phonetic segmentation is the process of splitting speech into distinct phonetic units. Human experts routinely perform this task manually by analyzing auditory and visual cues using analysis software, which is an extremely time-consuming process. Methods exist for automatic segmentation, but these are not always accurate enough. In order to improve automatic segmentation, we need to model it as close to the manual segmentation as possible. This corpus is an effort to capture the human segmentation behavior by recording experts performing a segmentation task. We believe that this data will enable us to highlight the important aspects of manual segmentation, which can be used in automatic segmentation to improve its accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

m2ci-msp/eyetracking-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Speech and dialogue systems