Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean
Hyunjung Joo, GyeongTaek Lee

TL;DR
This paper introduces Dual-Glob, a deep contrastive learning framework that improves classification of pitch accents in Seoul Korean by capturing holistic $F_0$ contour shapes, supported by a new large-scale annotated dataset.
Contribution
It presents the first large-scale benchmark dataset and a novel deep contrastive learning method for robust pitch accent classification in Seoul Korean.
Findings
Dual-Glob achieves 77.75% accuracy and 51.54% F1-score, outperforming baseline models.
The approach effectively captures structural features of $F_0$ contours.
The dataset contains 10,093 manually annotated Accentual Phrases.
Abstract
The intonational structure of Seoul Korean has been defined with discrete tonal categories within the Autosegmental-Metrical model of intonational phonology. However, it is challenging to map continuous contours to these invariant categories due to variable realizations in real-world speech. Our paper proposes Dual-Glob, a deep supervised contrastive learning framework to robustly classify fine-grained pitch accent patterns in Seoul Korean. Unlike conventional local predictive models, our approach captures holistic contour shapes by enforcing structural consistency between clean and augmented views in a shared latent space. To this aim, we introduce the first large-scale benchmark dataset, consisting of manually annotated 10,093 Accentual Phrases in Seoul Korean. Experimental results show that our Dual-Glob significantly outperforms strong baseline models with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
