TrustLLM: Trustworthiness in Large Language Models

Yue Huang; Lichao Sun; Haoran Wang; Siyuan Wu; Qihui Zhang; Yuan Li,; Chujie Gao; Yixin Huang; Wenhan Lyu; Yixuan Zhang; Xiner Li; Zhengliang Liu,; Yixin Liu; Yijue Wang; Zhikun Zhang; Bertie Vidgen; Bhavya Kailkhura; Caiming; Xiong; Chaowei Xiao; Chunyuan Li; Eric Xing; Furong Huang; Hao Liu; Heng Ji,; Hongyi Wang; Huan Zhang; Huaxiu Yao; Manolis Kellis; Marinka Zitnik; Meng; Jiang; Mohit Bansal; James Zou; Jian Pei; Jian Liu; Jianfeng Gao; Jiawei Han,; Jieyu Zhao; Jiliang Tang; Jindong Wang; Joaquin Vanschoren; John Mitchell,; Kai Shu; Kaidi Xu; Kai-Wei Chang; Lifang He; Lifu Huang; Michael Backes; Neil; Zhenqiang Gong; Philip S. Yu; Pin-Yu Chen; Quanquan Gu; Ran Xu; Rex Ying,; Shuiwang Ji; Suman Jana; Tianlong Chen; Tianming Liu; Tianyi Zhou; William; Wang; Xiang Li; Xiangliang Zhang; Xiao Wang; Xing Xie; Xun Chen; Xuyu Wang,; Yan Liu; Yanfang Ye; Yinzhi Cao; Yong Chen; Yue Zhao

arXiv:2401.05561·cs.CL·October 1, 2024·52 cites

TrustLLM: Trustworthiness in Large Language Models

Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li,, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu,, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming, Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces TrustLLM, a comprehensive framework for evaluating and improving trustworthiness in large language models across multiple dimensions, highlighting current challenges, benchmarks, and future research directions.

Contribution

It proposes principles and a benchmark for trustworthiness in LLMs, evaluates 16 models, and discusses open challenges and transparency issues.

Findings

01

Proprietary LLMs generally outperform open-source models in trustworthiness.

02

Some LLMs may over-calibrate, harming utility.

03

Trustworthiness correlates positively with functional effectiveness.

Abstract

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

allenai/tulu-3-trustllm-jailbreaktrigger-eval
dataset· 62 dl
62 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)

MethodsSparse Evolutionary Training