PalmBench: A Comprehensive Benchmark of Compressed Large Language Models   on Mobile Platforms

Yilong Li; Jingyu Liu; Hao Zhang; M Badri Narayanan; Utkarsh Sharma,; Shuai Zhang; Pan Hu; Yijing Zeng; Jayaram Raghuram; Suman Banerjee

arXiv:2410.05315·cs.LG·January 10, 2025

PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

Yilong Li, Jingyu Liu, Hao Zhang, M Badri Narayanan, Utkarsh Sharma,, Shuai Zhang, Pan Hu, Yijing Zeng, Jayaram Raghuram, Suman Banerjee

PDF

Open Access

TL;DR

This paper introduces PalmBench, a comprehensive benchmarking framework for evaluating compressed large language models on mobile devices, focusing on resource efficiency, performance, and safety aspects across various hardware configurations.

Contribution

We present a lightweight, automated benchmarking framework and provide extensive evaluations of quantized LLMs on multiple mobile platforms, highlighting their efficiency and safety trade-offs.

Findings

01

Quantization affects memory, speed, and power consumption.

02

Energy efficiency varies across mobile devices.

03

Compressed models show increased hallucinations and toxicity.

Abstract

Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network connection. Recent advancements (MLC, 2023a; Gerganov, 2023) have facilitated the local deployment of LLMs. However, local deployment also presents challenges, particularly in balancing quality (generative performance), latency, and throughput within the hardware constraints of mobile devices. In this paper, we introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate LLMs on mobile devices. We provide a comprehensive benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities. Unlike traditional benchmarks that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus