TrustLLM: Trustworthiness in Large Language Models
Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li,, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu,, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming, Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing

TL;DR
This paper introduces TrustLLM, a comprehensive framework for evaluating and improving trustworthiness in large language models across multiple dimensions, highlighting current challenges, benchmarks, and future research directions.
Contribution
It proposes principles and a benchmark for trustworthiness in LLMs, evaluates 16 models, and discusses open challenges and transparency issues.
Findings
Proprietary LLMs generally outperform open-source models in trustworthiness.
Some LLMs may over-calibrate, harming utility.
Trustworthiness correlates positively with functional effectiveness.
Abstract
Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)
MethodsSparse Evolutionary Training
