When Classical Chinese Meets Machine Learning: Explaining the Relative   Performances of Word and Sentence Segmentation Tasks

Chao-Lin Liu; Chang-Ting Chu; Wei-Ting Chang; and Ti-Yong Zheng

arXiv:2007.11171·cs.CL·July 23, 2020

When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks

Chao-Lin Liu, Chang-Ting Chu, Wei-Ting Chang, and Ti-Yong Zheng

PDF

TL;DR

This paper explores the effectiveness of deep learning for classical Chinese text segmentation, analyzing how different training corpora influence performance and providing explanations for observed variations.

Contribution

It demonstrates the viability of deep learning for classical Chinese segmentation and offers insights into how training data selection affects results.

Findings

01

Deep learning achieves satisfactory segmentation results.

02

Training corpus relevance influences segmentation performance.

03

Different corpus combinations yield varying results.

Abstract

We consider three major text sources about the Tang Dynasty of China in our experiments that aim to segment text written in classical Chinese. These corpora include a collection of Tang Tomb Biographies, the New Tang Book, and the Old Tang Book. We show that it is possible to achieve satisfactory segmentation results with the deep learning approach. More interestingly, we found that some of the relative superiority that we observed among different designs of experiments may be explainable. The relative relevance among the training corpora provides hints/explanation for the observed differences in segmentation results that were achieved when we employed different combinations of corpora to train the classifiers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.