Joint Chinese Word Segmentation and Span-based Constituency Parsing
Zhicheng Wang, Tianyu Shi, Cong Liu

TL;DR
This paper introduces a joint Chinese word segmentation and span-based constituency parsing method that improves parsing accuracy by integrating segmentation directly into the parsing process, reducing errors caused by separate segmentation steps.
Contribution
It proposes a novel approach that adds extra labels to characters for joint segmentation and parsing, enhancing performance over existing models.
Findings
Outperforms recent joint segmentation and parsing models on CTB 5.1
Reduces errors caused by separate segmentation and parsing steps
Demonstrates effectiveness of label augmentation in joint modeling
Abstract
In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
