Analogical Reasoning on Chinese Morphological and Semantic Relations
Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du

TL;DR
This paper introduces a new Chinese analogical reasoning benchmark, CA8, which evaluates the effectiveness of word embeddings in capturing morphological and semantic relations in Chinese language.
Contribution
It constructs a comprehensive Chinese analogical reasoning dataset and systematically analyzes factors affecting reasoning performance, establishing a new benchmark.
Findings
CA8 contains 17,813 questions covering morphological and semantic relations.
The dataset is validated as a reliable benchmark for Chinese word embedding evaluation.
Influences of vector representations, context features, and corpora are systematically explored.
Abstract
Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese. After delving into Chinese lexical knowledge, we sketch 68 implicit morphological relations and 28 explicit semantic relations. A big and balanced dataset CA8 is then built for this task, including 17813 questions. Furthermore, we systematically explore the influences of vector representations, context features, and corpora on analogical reasoning. With the experiments, CA8 is proved to be a reliable benchmark for evaluating Chinese word embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
