Joint Chinese Word Segmentation and Span-based Constituency Parsing

Zhicheng Wang; Tianyu Shi; Cong Liu

arXiv:2211.01638·cs.CL·December 1, 2022

Joint Chinese Word Segmentation and Span-based Constituency Parsing

Zhicheng Wang, Tianyu Shi, Cong Liu

PDF

Open Access

TL;DR

This paper introduces a joint Chinese word segmentation and span-based constituency parsing method that improves parsing accuracy by integrating segmentation directly into the parsing process, reducing errors caused by separate segmentation steps.

Contribution

It proposes a novel approach that adds extra labels to characters for joint segmentation and parsing, enhancing performance over existing models.

Findings

01

Outperforms recent joint segmentation and parsing models on CTB 5.1

02

Reduces errors caused by separate segmentation and parsing steps

03

Demonstrates effectiveness of label augmentation in joint modeling

Abstract

In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies