# Data mining Mandarin tone contour shapes

**Authors:** Shuo Zhang

arXiv: 1907.01668 · 2019-07-04

## TL;DR

This paper applies data mining and NLP techniques to analyze the variability of Mandarin tone contours in spontaneous speech, revealing clusters and correlations with linguistic features to enhance phonological understanding.

## Contribution

It introduces a graph-based clustering method for Mandarin tone contours and links these clusters to linguistic features, advancing analysis of tonal variability.

## Key findings

- Identified distinct clusters of tone contour shapes.
- Found correlations between contour types and linguistic features.
- Discussed implications for phonology and information theory.

## Abstract

In spontaneous speech, Mandarin tones that belong to the same tone category may exhibit many different contour shapes. We explore the use of data mining and NLP techniques for understanding the variability of tones in a large corpus of Mandarin newscast speech. First, we adapt a graph-based approach to characterize the clusters (fuzzy types) of tone contour shapes observed in each tone n-gram category. Second, we show correlations between these realized contour shape types and a bag of automatically extracted linguistic features. We discuss the implications of the current study within the context of phonological and information theory.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01668/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01668/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1907.01668/full.md

---
Source: https://tomesphere.com/paper/1907.01668