Causal Evaluation of Language Models

Sirui Chen; Bo Peng; Meiqi Chen; Ruiqi Wang; Mengying Xu; Xingyu Zeng,; Rui Zhao; Shengjie Zhao; Yu Qiao; Chaochao Lu

arXiv:2405.00622·cs.CL·May 2, 2024·2 cites

Causal Evaluation of Language Models

Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng,, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

PDF

Open Access 2 Repos

TL;DR

This paper introduces CaLM, a comprehensive benchmark for evaluating the causal reasoning capabilities of language models, including a large dataset, evaluation framework, and analysis platform to guide future research.

Contribution

It presents the first systematic framework and dataset for assessing causal reasoning in language models, along with extensive evaluations and a community platform.

Findings

01

28 language models evaluated on 92 causal targets

02

50 empirical findings across 9 dimensions

03

CaLM platform supports ongoing research and updates

Abstract

Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive benchmark for evaluating the causal reasoning capabilities of language models. First, we propose the CaLM framework, which establishes a foundational taxonomy consisting of four modules: causal target (i.e., what to evaluate), adaptation (i.e., how to obtain the results), metric (i.e., how to measure the results), and error (i.e., how to analyze the bad results). This taxonomy defines a broad evaluation design space while systematically selecting criteria and priorities. Second, we compose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Topic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training