Harmonic LLMs are Trustworthy

Nicholas S. Kersting; Mohammad Rahman; Suchismitha Vedala; Yang Wang

arXiv:2404.19708·cs.LG·July 26, 2024

Harmonic LLMs are Trustworthy

Nicholas S. Kersting, Mohammad Rahman, Suchismitha Vedala, Yang Wang

PDF

Open Access

TL;DR

This paper presents a model-agnostic, unsupervised method to assess the robustness of black-box LLMs by measuring their local deviation from harmoniticity, which correlates with trustworthiness and can reveal hallucinations.

Contribution

The authors introduce a novel mathematical standard, harmonic deviation $oldsymbol{ extgamma}$, for real-time robustness testing of any black-box LLM without supervision or model access.

Findings

01

$oldsymbol{ extgamma} o 0$ correlates with trustworthiness

02

Higher $oldsymbol{ extgamma}$ values expose hallucinations

03

Mid-size open-source models can outperform large commercial models

Abstract

We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time via its local deviation from harmoniticity, denoted as $γ$ . To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. To show general application and immediacy of results, we measure $γ$ in 10 popular LLMs (ChatGPT, Claude-2.1, Claude3.0, GPT-4, GPT-4o, Smaug-72B, Mixtral-8x7B, Llama2-7B, Mistral-7B and MPT-7B) across thousands of queries in three objective domains: WebQA, ProgrammingQA, and TruthfulQA. Across all models and domains tested, human annotation confirms that $γ \to 0$ indicates trustworthiness, and conversely searching higher values of $γ$ easily exposes examples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security

MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection