Automatic Construction of Chinese Verb Collostruction Database

Xuri Tang; Daohuan Liu

arXiv:2601.04197·cs.CL·January 9, 2026

Automatic Construction of Chinese Verb Collostruction Database

Xuri Tang, Daohuan Liu

PDF

Open Access

TL;DR

This paper introduces an unsupervised method for building a Chinese verb collostruction database, providing interpretable rules to enhance language understanding and improve error correction over large language models.

Contribution

It formally defines verb collostructions as graphs and employs clustering algorithms to generate them from large corpora, advancing Chinese linguistic resources.

Findings

01

Generated collostructions show functional independence and graded typicality.

02

Error correction using collostructions outperforms LLM-based methods.

03

The approach offers explicit, interpretable linguistic rules.

Abstract

This paper proposes a fully unsupervised approach to the construction of verb collostruction database for Chinese language, aimed at complementing LLMs by providing explicit and interpretable rules for application scenarios where explanation and interpretability are indispensable. The paper formally defines a verb collostruction as a projective, rooted, ordered, and directed acyclic graph and employs a series of clustering algorithms to generate collostructions for a given verb from a list of sentences retrieved from large-scale corpus. Statistical analysis demonstrates that the generated collostructions possess the design features of functional independence and graded typicality. Evaluation with verb grammatical error correction shows that the error correction algorithm based on maximum matching with collostructions achieves better performance than LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining