Quantifying the Capabilities of LLMs across Scale and Precision
Sher Badshah, Hassan Sajjad

TL;DR
This paper evaluates how model size and quantization affect large language models' performance, showing larger models outperform smaller ones and maintain accuracy even at low-precision levels, highlighting scale's importance.
Contribution
It provides a comprehensive analysis of the impact of scale and quantization on open-source LLMs, demonstrating the resilience of larger models to low-precision quantization.
Findings
Larger models outperform smaller ones across various tasks.
Models maintain high accuracy at 4-bit quantization.
Scaling remains crucial for performance enhancement.
Abstract
Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies
MethodsLLaMA
