Automatic Construction of Chinese Verb Collostruction Database
Xuri Tang, Daohuan Liu

TL;DR
This paper introduces an unsupervised method for building a Chinese verb collostruction database, providing interpretable rules to enhance language understanding and improve error correction over large language models.
Contribution
It formally defines verb collostructions as graphs and employs clustering algorithms to generate them from large corpora, advancing Chinese linguistic resources.
Findings
Generated collostructions show functional independence and graded typicality.
Error correction using collostructions outperforms LLM-based methods.
The approach offers explicit, interpretable linguistic rules.
Abstract
This paper proposes a fully unsupervised approach to the construction of verb collostruction database for Chinese language, aimed at complementing LLMs by providing explicit and interpretable rules for application scenarios where explanation and interpretability are indispensable. The paper formally defines a verb collostruction as a projective, rooted, ordered, and directed acyclic graph and employs a series of clustering algorithms to generate collostructions for a given verb from a list of sentences retrieved from large-scale corpus. Statistical analysis demonstrates that the generated collostructions possess the design features of functional independence and graded typicality. Evaluation with verb grammatical error correction shows that the error correction algorithm based on maximum matching with collostructions achieves better performance than LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining
