# Prosodic Event Recognition using Convolutional Neural Networks with   Context Information

**Authors:** Sabrina Stehwien, Ngoc Thang Vu

arXiv: 1706.00741 · 2017-06-05

## TL;DR

This paper explores using CNNs with context features for prosodic event recognition, achieving strong results in both speaker-dependent and speaker-independent scenarios.

## Contribution

It introduces position features into CNNs for prosodic event detection and demonstrates effective generalization across speakers.

## Key findings

- CNN with context features improves prosodic event detection accuracy
- Method performs well in speaker-independent scenarios
- Simple and efficient approach yields strong results

## Abstract

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.00741/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1706.00741/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1706.00741/full.md

---
Source: https://tomesphere.com/paper/1706.00741