Learning to Mine Chinese Coordinate Terms Using the Web
Xiaojiang Huang, Xiaojun Wan, Jianguo Xiao

TL;DR
This paper presents a semi-supervised approach for extracting Chinese coordinate terms from web data, effectively handling polysemy and grouping terms into concepts, with high-quality, wide-coverage results.
Contribution
It introduces a novel semi-supervised method combining linguistic patterns and learned patterns for Chinese coordinate term extraction from web data.
Findings
High-quality extraction results
Wide coverage of coordinate terms
Effective handling of polysemy
Abstract
Coordinate relation refers to the relation between instances of a concept and the relation between the directly hyponyms of a concept. In this paper, we focus on the task of extracting terms which are coordinate with a user given seed term in Chinese, and grouping the terms which belong to different concepts if the seed term has several meanings. We propose a semi-supervised method that integrates manually defined linguistic patterns and automatically learned semi-structural patterns to extract coordinate terms in Chinese from web search results. In addition, terms are grouped into different concepts based on their co-occurring terms and contexts. We further calculate the saliency scores of extracted terms and rank them accordingly. Experimental results demonstrate that our proposed method generates results with high quality and wide coverage.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling
