HonestLLM: Toward an Honest and Helpful Large Language Model
Chujie Gao, Siyuan Wu, Yue Huang, Dongping Chen, Qihui Zhang, Zhengyan, Fu, Yao Wan, Lichao Sun, Xiangliang Zhang

TL;DR
This paper proposes principles, datasets, and methods to enhance honesty and helpfulness in large language models, demonstrating significant improvements across multiple models through training-free and fine-tuning approaches.
Contribution
It introduces a comprehensive honesty framework, a new dataset HoneSet, and two novel methods to improve LLM honesty and helpfulness, validated by extensive experiments.
Findings
Significant honesty improvements in nine LLMs, up to 124.7%.
Introduction of HoneSet dataset for honesty assessment.
Effective training-free and fine-tuning methods for honesty enhancement.
Abstract
Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles aimed at guaranteeing the honesty of LLM. Additionally, we introduce a novel dataset, referred to as HoneSet, comprising 930 queries spanning six categories meticulously crafted to assess an LLM's capacity for maintaining honesty. Subsequently, we present two approaches to augmenting honesty and helpfulness in LLMs: a training-free enhancement and a fine-tuning-based improvement. The training-free approach, which is based on curiosity-driven prompting, empowers LLMs to articulate internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
