# Giving Attention to the Unexpected: Using Prosody Innovations in   Disfluency Detection

**Authors:** Vicky Zayats, Mari Ostendorf

arXiv: 1904.04388 · 2019-04-10

## TL;DR

This paper presents a novel method for disfluency detection in speech by predicting acoustic-prosodic cues from text and integrating them with transcripts, improving accuracy over text-only models.

## Contribution

It introduces a new approach to extract prosodic features via text-based prediction and demonstrates effective fusion techniques for enhanced disfluency detection.

## Key findings

- Prosodic cue prediction improves disfluency detection accuracy.
- Early and late fusion techniques both outperform text-only models.
- The proposed method achieves significant gains over existing approaches.

## Abstract

Disfluencies in spontaneous speech are known to be associated with prosodic disruptions. However, most algorithms for disfluency detection use only word transcripts. Integrating prosodic cues has proved difficult because of the many sources of variability affecting the acoustic correlates. This paper introduces a new approach to extracting acoustic-prosodic cues using text-based distributional prediction of acoustic cues to derive vector z-score features (innovations). We explore both early and late fusion techniques for integrating text and prosody, showing gains over a high-accuracy text-only model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04388/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.04388/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1904.04388/full.md

---
Source: https://tomesphere.com/paper/1904.04388