Sentence Segmentation for Classical Chinese Based on LSTM with Radical   Embedding

Xu Han; Hongsu Wang; Sanqian Zhang; Qunchao Fu; Jun S. Liu

arXiv:1810.03479·cs.CL·February 20, 2020·5 cites

Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding

Xu Han, Hongsu Wang, Sanqian Zhang, Qunchao Fu, Jun S. Liu

PDF

Open Access

TL;DR

This paper introduces a radical embedding feature into an LSTM-CRF model to improve sentence segmentation accuracy in classical Chinese texts, demonstrating significant performance gains across diverse literary styles.

Contribution

It proposes a novel radical embedding feature for LSTM models, enhancing sentence segmentation in pre-modern Chinese texts with diverse styles.

Findings

01

Improved accuracy over previous methods in classical Chinese sentence segmentation

02

Radical embedding enhances model performance especially on Tang Epitaph texts

03

Model achieves state-of-the-art results on multiple classical Chinese datasets

Abstract

In this paper, we develop a low than character feature embedding called radical embedding, and apply it on LSTM model for sentence segmentation of pre modern Chinese texts. The datasets includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles. LSTM CRF model is a state of art method for the sequence labeling problem. Our new model adds a component of radical embedding, which leads to improved performances. Experimental results based on the aforementioned Chinese books demonstrates a better accuracy than earlier methods on sentence segmentation, especial in Tang Epitaph texts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Natural Language Processing Techniques · Topic Modeling

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory