Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs)

Mahmoud Elgenedy

arXiv:2601.02298·cs.CL·January 6, 2026

Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs)

Mahmoud Elgenedy

PDF

Open Access

TL;DR

This paper introduces Power-of-Two Quantization-Aware-Training (PoT-QAT) for large language models, significantly reducing memory and computation requirements while maintaining performance, enabling efficient deployment on edge devices.

Contribution

The paper proposes a novel PoT quantization method combined with QAT to improve LLM efficiency, achieving substantial memory savings and faster inference with minimal performance loss.

Findings

01

Memory saving of approximately 87.5%

02

Inference speed increased by 3-10x

03

Perplexity improved by 66% after quantization

Abstract

In Large Language Models (LLMs), the number of parameters has grown exponentially in the past few years, e.g., from 1.5 billion parameters in GPT-2 to 175 billion in GPT-3 to possibly more than trillion in higher versions. This raises a significant challenge for implementation, especially for Edge devices. Unlike cloud computing, memory and processing power for Edge devices are very limited, which necessitates developing novel ideas to make such applications feasible. In this work, we investigate compressing weights with a special quantization that limits numbers to only power-of-two (PoT). This helps save a huge amount of memory as only exponents need to be stored, more importantly, it significantly reduces processing power by replacing costly multiplication with low cost bit shifting. To overcome performance loss due to this strict quantization, we investigate Quantization Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Machine Learning and Data Classification · Natural Language Processing Techniques