CA-EHN: Commonsense Analogy from E-HowNet

Peng-Hsuan Li; Tsan-Yu Yang; Wei-Yun Ma

arXiv:1908.07218·cs.CL·June 1, 2020

CA-EHN: Commonsense Analogy from E-HowNet

Peng-Hsuan Li, Tsan-Yu Yang, Wei-Yun Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces CA-EHN, a large-scale Chinese word analogy dataset based on E-HowNet, to evaluate how well word embeddings capture commonsense knowledge beyond traditional handcrafted datasets.

Contribution

It creates the first large-scale commonsense analogy dataset from E-HowNet, enabling better evaluation of word representations for commonsense reasoning.

Findings

01

CA-EHN contains 90,505 analogies across 763 relations.

02

The dataset effectively evaluates commonsense embedding quality.

03

Experiments demonstrate its usefulness as an indicator of embedding performance.

Abstract

Embedding commonsense knowledge is crucial for end-to-end models to generalize inference beyond training corpora. However, existing word analogy datasets have tended to be handcrafted, involving permutations of hundreds of words with only dozens of pre-defined relations, mostly morphological relations and named entities. In this work, we model commonsense knowledge down to word-level analogical reasoning by leveraging E-HowNet, an ontology that annotates 88K Chinese words with their structured sense definitions and English translations. We present CA-EHN, the first commonsense word analogy dataset containing 90,505 analogies covering 5,656 words and 763 relations. Experiments show that CA-EHN stands out as a great indicator of how well word representations embed commonsense knowledge. The dataset is publicly available at https://github.com/ckiplab/CA-EHN.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ckiplab/CA-EHN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management