Harmonic LLMs are Trustworthy
Nicholas S. Kersting, Mohammad Rahman, Suchismitha Vedala, Yang Wang

TL;DR
This paper presents a model-agnostic, unsupervised method to assess the robustness of black-box LLMs by measuring their local deviation from harmoniticity, which correlates with trustworthiness and can reveal hallucinations.
Contribution
The authors introduce a novel mathematical standard, harmonic deviation $oldsymbol{ extgamma}$, for real-time robustness testing of any black-box LLM without supervision or model access.
Findings
$oldsymbol{ extgamma} o 0$ correlates with trustworthiness
Higher $oldsymbol{ extgamma}$ values expose hallucinations
Mid-size open-source models can outperform large commercial models
Abstract
We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time via its local deviation from harmoniticity, denoted as . To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. To show general application and immediacy of results, we measure in 10 popular LLMs (ChatGPT, Claude-2.1, Claude3.0, GPT-4, GPT-4o, Smaug-72B, Mixtral-8x7B, Llama2-7B, Mistral-7B and MPT-7B) across thousands of queries in three objective domains: WebQA, ProgrammingQA, and TruthfulQA. Across all models and domains tested, human annotation confirms that indicates trustworthiness, and conversely searching higher values of easily exposes examples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection
