Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification
Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye

TL;DR
This paper investigates how different quantization levels and types affect LLaMA-3.1's performance in qualitative analysis and proposes a multi-pass prompt verification method to improve stability and accuracy of low-bit models.
Contribution
It introduces a quantization-aware multi-pass prompt verification approach that enhances low-bit LLMs' stability and accuracy in qualitative research tasks.
Findings
8-bit models closely match human ground truth.
4-bit models become stable with the proposed method.
3-bit and 2-bit models improve performance after verification.
Abstract
Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
