Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure
Yang Hou, Zhenghua Li

TL;DR
This paper introduces a novel character-level Chinese dependency parsing method that models latent intra-word structures, improving parsing accuracy by interpreting words as forests of character trees and ensuring structural consistency.
Contribution
It proposes a new approach to Chinese dependency parsing by modeling intra-word structures with a constrained Eisner algorithm, bridging the gap between word and character-level analysis.
Findings
Outperforms previous pipeline and joint models on Chinese treebanks.
Coarse-to-fine parsing enhances intra-word structure prediction.
Ensures single root and inter-word dependency consistency.
Abstract
Revealing the syntactic structure of sentences in Chinese poses significant challenges for word-level parsers due to the absence of clear word boundaries. To facilitate a transition from word-level to character-level Chinese dependency parsing, this paper proposes modeling latent internal structures within words. In this way, each word-level dependency tree is interpreted as a forest of character-level trees. A constrained Eisner algorithm is implemented to ensure the compatibility of character-level trees, guaranteeing a single root for intra-word structures and establishing inter-word dependencies between these roots. Experiments on Chinese treebanks demonstrate the superiority of our method over both the pipeline framework and previous joint models. A detailed analysis reveals that a coarse-to-fine parsing strategy empowers the model to predict more linguistically plausible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
