Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam
Matheus L. O. Santos, Cl\'audio E. C. Campelo

TL;DR
This study assesses quantized LLaMA-based models' performance on the Brazilian Secondary School Exam, demonstrating their viability on home hardware with accuracy around 46-49% and reasonable processing times.
Contribution
It introduces a benchmark for quantized LLaMA models on a real-world educational dataset, evaluating both accuracy and computational efficiency.
Findings
Best models achieved ~46% accuracy on Portuguese questions and ~49% on English translations.
Processing times were approximately 20 seconds for 7B models and 50 seconds for 13B models.
Models can run effectively on consumer hardware for educational assessment tasks.
Abstract
Although Large Language Models (LLMs) represent a revolution in the way we interact with computers, allowing the construction of complex questions and the ability to reason over a sequence of statements, their use is restricted due to the need for dedicated hardware for execution. In this study, we evaluate the performance of LLMs based on the 7 and 13 billion LLaMA models, subjected to a quantization process and run on home hardware. The models considered were Alpaca, Koala, and Vicuna. To evaluate the effectiveness of these models, we developed a database containing 1,006 questions from the ENEM (Brazilian National Secondary School Exam). Our analysis revealed that the best performing models achieved an accuracy of approximately 46% for the original texts of the Portuguese questions and 49% on their English translations. In addition, we evaluated the computational efficiency of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
