Improved POS tagging for spontaneous, clinical speech using data   augmentation

Seth Kulick; Neville Ryant; David J. Irwin; Naomi Nevler; Sunghye Cho

arXiv:2307.05796·cs.CL·July 13, 2023·1 cites

Improved POS tagging for spontaneous, clinical speech using data augmentation

Seth Kulick, Neville Ryant, David J. Irwin, Naomi Nevler, Sunghye Cho

PDF

Open Access

TL;DR

This paper presents a data augmentation approach to improve part-of-speech tagging accuracy on spontaneous clinical speech without relying on in-domain treebanks, by adapting out-of-domain newswire data.

Contribution

It introduces a novel data augmentation method to enhance POS tagging in clinical speech, bypassing the need for in-domain training data.

Findings

01

Augmented training data improves POS tagging accuracy on clinical speech.

02

The method outperforms baseline models trained without augmentation.

03

Effective for speech from patients with neurodegenerative conditions.

Abstract

This paper addresses the problem of improving POS tagging of transcripts of speech from clinical populations. In contrast to prior work on parsing and POS tagging of transcribed speech, we do not make use of an in domain treebank for training. Instead, we train on an out of domain treebank of newswire using data augmentation techniques to make these structures resemble natural, spontaneous speech. We trained a parser with and without the augmented data and tested its performance using manually validated POS tags in clinical speech produced by patients with various types of neurodegenerative conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling