Incremental Learning for Fully Unsupervised Word Segmentation Using   Penalized Likelihood and Model Selection

Ruey-Cheng Chen

arXiv:1607.05822·cs.CL·September 26, 2016

Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection

Ruey-Cheng Chen

PDF

Open Access

TL;DR

This paper introduces an innovative incremental learning method for unsupervised word segmentation that integrates probabilistic modeling, penalized likelihood, and model selection, achieving top-tier results in phonemic and orthographic segmentation.

Contribution

It presents a fully unsupervised, parameter-efficient approach combining novel penalties and model selection criteria for improved word segmentation.

Findings

01

Achieved top-tier segmentation performance

02

Effectively models long word formation

03

Automatically learns parameters from data

Abstract

We present a novel incremental learning approach for unsupervised word segmentation that combines features from probabilistic modeling and model selection. This includes super-additive penalties for addressing the cognitive burden imposed by long word formation, and new model selection criteria based on higher-order generative assumptions. Our approach is fully unsupervised; it relies on a small number of parameters that permits flexible modeling and a mechanism that automatically learns parameters from the data. Through experimentation, we show that this intricate design has led to top-tier performance in both phonemic and orthographic word segmentation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling