HonestLLM: Toward an Honest and Helpful Large Language Model

Chujie Gao; Siyuan Wu; Yue Huang; Dongping Chen; Qihui Zhang; Zhengyan; Fu; Yao Wan; Lichao Sun; Xiangliang Zhang

arXiv:2406.00380·cs.CL·December 12, 2024

HonestLLM: Toward an Honest and Helpful Large Language Model

Chujie Gao, Siyuan Wu, Yue Huang, Dongping Chen, Qihui Zhang, Zhengyan, Fu, Yao Wan, Lichao Sun, Xiangliang Zhang

PDF

Open Access 1 Repo 8 Models

TL;DR

This paper proposes principles, datasets, and methods to enhance honesty and helpfulness in large language models, demonstrating significant improvements across multiple models through training-free and fine-tuning approaches.

Contribution

It introduces a comprehensive honesty framework, a new dataset HoneSet, and two novel methods to improve LLM honesty and helpfulness, validated by extensive experiments.

Findings

01

Significant honesty improvements in nine LLMs, up to 124.7%.

02

Introduction of HoneSet dataset for honesty assessment.

03

Effective training-free and fine-tuning methods for honesty enhancement.

Abstract

Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles aimed at guaranteeing the honesty of LLM. Additionally, we introduce a novel dataset, referred to as HoneSet, comprising 930 queries spanning six categories meticulously crafted to assess an LLM's capacity for maintaining honesty. Subsequently, we present two approaches to augmenting honesty and helpfulness in LLMs: a training-free enhancement and a fine-tuning-based improvement. The training-free approach, which is based on curiosity-driven prompting, empowers LLMs to articulate internal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Flossiee/HonestyLLM
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques