CHBench: A Chinese Dataset for Evaluating Health in Large Language   Models

Chenlu Guo; Nuo Xu; Yi Chang; and Yuan Wu

arXiv:2409.15766·cs.CL·February 24, 2025

CHBench: A Chinese Dataset for Evaluating Health in Large Language Models

Chenlu Guo, Nuo Xu, Yi Chang, and Yuan Wu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces CHBench, a comprehensive Chinese health-related benchmark for evaluating large language models' safety and accuracy in handling physical and mental health inquiries, revealing significant performance gaps.

Contribution

The paper presents the first safety-oriented Chinese health benchmark, CHBench, with extensive entries to evaluate LLMs' health understanding and safety capabilities.

Findings

01

Existing Chinese LLMs show significant safety and accuracy gaps.

02

CHBench covers diverse mental and physical health topics.

03

The benchmark highlights urgent need for improved health-related LLMs.

Abstract

With the rapid development of large language models (LLMs), assessing their performance on health-related inquiries has become increasingly essential. The use of these models in real-world contexts-where misinformation can lead to serious consequences for individuals seeking medical advice and support-necessitates a rigorous focus on safety and trustworthiness. In this work, we introduce CHBench, the first comprehensive safety-oriented Chinese health-related benchmark designed to evaluate LLMs' capabilities in understanding and addressing physical and mental health issues with a safety perspective across diverse scenarios. CHBench comprises 6,493 entries on mental health and 2,999 entries on physical health, spanning a wide range of topics. Our extensive evaluations of four popular Chinese LLMs highlight significant gaps in their capacity to deliver safe and accurate health information,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tracyguo2001/chbench
noneOfficial

Datasets

TracyGuo/CHBench
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCardiovascular Health and Risk Factors