TL;DR
This paper introduces TruthfulnessEval, a framework for assessing the truthfulness of quantized large language models across logical reasoning, common sense, and falsehoods, revealing vulnerabilities to deceptive prompts despite internal truthfulness.
Contribution
The work presents a novel evaluation framework for quantized LLMs' truthfulness and uncovers their susceptibility to deceptive prompts, informing future alignment strategies.
Findings
Quantized models retain internal truth representations.
Deceptive prompts can override truthful behavior.
Quantized models produce false outputs under misleading prompts.
Abstract
Quantization enables efficient deployment of large language models (LLMs) in resource-constrained environments by significantly reducing memory and computation costs. While quantized LLMs often maintain performance on perplexity and zero-shot tasks, their impact on truthfulness-whether generating truthful or deceptive responses-remains largely unexplored. In this work, we introduce TruthfulnessEval, a comprehensive evaluation framework for assessing the truthfulness of quantized LLMs across three dimensions: (1) Truthfulness on Logical Reasoning; (2) Truthfulness on Common Sense; and (3) Truthfulness on Imitative Falsehoods. Using this framework, we examine mainstream quantization techniques (ranging from 4-bit to extreme 2-bit) across several open-source LLMs. Surprisingly, we find that while quantized models retain internally truthful representations, they are more susceptible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
