Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence   Labelling

Yan Shao

arXiv:1709.03756·cs.CL·September 13, 2017·5 cites

Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence Labelling

Yan Shao

PDF

Open Access 2 Repos

TL;DR

This paper introduces a universal character-level sequence labelling approach using bidirectional RNNs with CRFs for cross-lingual word and morpheme segmentation, achieving high accuracy across multiple languages without language-specific tuning.

Contribution

Proposes a universal, language-agnostic sequence labelling system for word and morpheme segmentation using neural networks, evaluated on diverse languages with superior results.

Findings

01

Achieves high accuracy on all evaluated languages

02

Outperforms other systems in the shared tasks

03

Demonstrates effectiveness without language-specific adjustments

Abstract

This paper presents our segmentation system developed for the MLP 2017 shared tasks on cross-lingual word segmentation and morpheme segmentation. We model both word and morpheme segmentation as character-level sequence labelling tasks. The prevalent bidirectional recurrent neural network with conditional random fields as the output interface is adapted as the baseline system, which is further improved via ensemble decoding. Our universal system is applied to and extensively evaluated on all the official data sets without any language-specific adjustment. The official evaluation results indicate that the proposed model achieves outstanding accuracies both for word and morpheme segmentation on all the languages in various types when compared to the other participating systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis