INT-FP-QSim: Mixed Precision and Formats For Large Language Models and   Vision Transformers

Lakshmi Nair; Mikhail Bernadskiy; Arulselvan Madhavan; Craig Chan,; Ayon Basumallik; Darius Bunandar

arXiv:2307.03712·cs.LG·July 10, 2023·1 cites

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan,, Ayon Basumallik, Darius Bunandar

PDF

Open Access 1 Repo

TL;DR

INT-FP-QSim is an open-source simulator that allows flexible evaluation of large language models and vision transformers across various numerical precisions and formats, facilitating research in model quantization.

Contribution

It introduces a versatile simulation tool combining multiple open-source resources to evaluate the impact of different numerical formats on model performance.

Findings

01

Different numerical formats significantly affect model accuracy.

02

4-bit weights and activations can be effective with proper quantization.

03

Comparison of recent quantization methods highlights their relative strengths.

Abstract

The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lightmatter-ai/int-fp-qsim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis