A collection of principles for guiding and evaluating large language models
Konstantin Hebenstreit, Robert Praas, Matthias Samwald

TL;DR
This paper compiles 220 principles from diverse literature to guide and evaluate large language models, focusing on reasoning, transparency, safety, and ethics, with initial expert insights and future directions.
Contribution
It introduces a curated set of 37 core principles organized into seven categories to steer and assess LLM reasoning and behavior.
Findings
Identified 220 principles from literature
Derived 37 core principles organized into categories
Conducted expert survey on principle importance
Abstract
Large language models (LLMs) demonstrate outstanding capabilities, but challenges remain regarding their ability to solve complex reasoning tasks, as well as their transparency, robustness, truthfulness, and ethical alignment. In this preliminary study, we compile a set of core principles for steering and evaluating the reasoning of LLMs by curating literature from several relevant strands of work: structured reasoning in LLMs, self-evaluation/self-reflection, explainability, AI system safety/security, guidelines for human critical thinking, and ethical/regulatory guidelines for AI. We identify and curate a list of 220 principles from literature, and derive a set of 37 core principles organized into seven categories: assumptions and perspectives, reasoning, information and evidence, robustness and security, ethics, utility, and implications. We conduct a small-scale expert survey,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
