CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

Xiaqiang Tang; Jian Li; Keyu Hu; Du Nan; Xiaolong Li; Xi Zhang; Weigao Sun; Sihong Xie

arXiv:2505.20767·cs.CL·June 26, 2025

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

Xiaqiang Tang, Jian Li, Keyu Hu, Du Nan, Xiaolong Li, Xi Zhang, Weigao Sun, Sihong Xie

PDF

Open Access 1 Repo 1 Models 1 Datasets 1 Video

TL;DR

CogniBench introduces a legal-inspired framework and dataset for evaluating the faithfulness of cognitive statements generated by large language models, addressing a gap in existing benchmarks that overlook inference-based hallucinations.

Contribution

The paper presents a novel framework inspired by legal evidence assessment and a large-scale dataset for detecting cognitive hallucinations in LLMs, along with an automatic annotation pipeline.

Findings

01

Developed CogniBench dataset revealing insightful statistics.

02

Created an automatic annotation pipeline for scalable dataset generation.

03

Facilitated training of detectors for factual and cognitive hallucinations.

Abstract

Faithfulness hallucinations are claims generated by a Large Language Model (LLM) not supported by contexts provided to the LLM. Lacking assessment standards, existing benchmarks focus on "factual statements" that rephrase source materials while overlooking "cognitive statements" that involve making inferences from the given context. Consequently, evaluating and detecting the hallucination of cognitive statements remains challenging. Inspired by how evidence is assessed in the legal domain, we design a rigorous framework to assess different levels of faithfulness of cognitive statements and introduce the CogniBench dataset where we reveal insightful statistics. To keep pace with rapidly evolving LLMs, we further develop an automatic annotation pipeline that scales easily across different models. This results in a large-scale CogniBench-L dataset, which facilitates training accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

futureeeeee/cognibench
noneOfficial

Models

🤗
future7/CogniDet
model· 11 dl· ♡ 1
11 dl♡ 1

Datasets

future7/CogniBench
dataset· 20 dl
20 dl

Videos

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models· underline

Taxonomy

TopicsMental Health via Writing · Topic Modeling · Adversarial Robustness in Machine Learning

MethodsFocus