Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs

Yao Fu; Xianxuan Long; Runchao Li; Haotian Yu; Mu Sheng; Xiaotian Han; Yu Yin; and Pan Li

arXiv:2508.19432·cs.AI·August 28, 2025

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs

Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, and Pan Li

PDF

1 Video

TL;DR

This paper introduces TruthfulnessEval, a framework for assessing the truthfulness of quantized large language models across logical reasoning, common sense, and falsehoods, revealing vulnerabilities to deceptive prompts despite internal truthfulness.

Contribution

The work presents a novel evaluation framework for quantized LLMs' truthfulness and uncovers their susceptibility to deceptive prompts, informing future alignment strategies.

Findings

01

Quantized models retain internal truth representations.

02

Deceptive prompts can override truthful behavior.

03

Quantized models produce false outputs under misleading prompts.

Abstract

Quantization enables efficient deployment of large language models (LLMs) in resource-constrained environments by significantly reducing memory and computation costs. While quantized LLMs often maintain performance on perplexity and zero-shot tasks, their impact on truthfulness-whether generating truthful or deceptive responses-remains largely unexplored. In this work, we introduce TruthfulnessEval, a comprehensive evaluation framework for assessing the truthfulness of quantized LLMs across three dimensions: (1) Truthfulness on Logical Reasoning; (2) Truthfulness on Common Sense; and (3) Truthfulness on Imitative Falsehoods. Using this framework, we examine mainstream quantization techniques (ranging from 4-bit to extreme 2-bit) across several open-source LLMs. Surprisingly, we find that while quantized models retain internally truthful representations, they are more susceptible to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs· underline