CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks   for Chinese Large Language Models

Dan Shi; Chaobin You; Jiantao Huang; Taihao Li; Deyi Xiong

arXiv:2312.12853·cs.CL·December 21, 2023·2 cites

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

Dan Shi, Chaobin You, Jiantao Huang, Taihao Li, Deyi Xiong

PDF

Open Access 1 Repo 1 Video

TL;DR

CORECODE is a comprehensive Chinese dialogue dataset with annotated commonsense knowledge designed to evaluate and improve large language models' reasoning and conflict detection capabilities in everyday conversations.

Contribution

The paper introduces CORECODE, a large, annotated dialogue dataset with benchmark tasks for evaluating commonsense reasoning in Chinese LLMs, including a standardized annotation scheme and diverse reasoning tasks.

Findings

01

Existing Chinese LLMs perform poorly on CORECODE tasks.

02

ChatGPT achieves only 0.275 accuracy on domain identification in zero-shot setting.

03

The dataset facilitates future research in commonsense reasoning for LLMs.

Abstract

As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danshi777/corecode
noneOfficial

Videos

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems