Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng, Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li

TL;DR
This paper provides a comprehensive survey of key dimensions for evaluating the trustworthiness of large language models, including reliability, safety, fairness, and social norm adherence, along with measurement studies on popular LLMs.
Contribution
It introduces a detailed framework for assessing LLM alignment across multiple trustworthiness categories and presents empirical measurement results to guide practitioners.
Findings
Aligned models generally perform better in trustworthiness
Effectiveness of alignment varies across categories
Fine-grained analysis is essential for improvement
Abstract
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Residual Connection · Dense Connections
