Crayon: Customized On-Device LLM via Instant Adapter Blending and   Edge-Server Hybrid Inference

Jihwan Bang; Juntae Lee; Kyuhong Shim; Seunghan Yang; Simyung Chang

arXiv:2406.07007·cs.CL·June 12, 2024

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

PDF

Open Access 1 Video

TL;DR

Crayon enables efficient on-device customization of large language models by instant adapter blending and hybrid inference, reducing cloud dependency and privacy risks while maintaining high performance.

Contribution

We introduce Crayon, a novel on-device LLM customization method using instant adapter blending and a hybrid inference strategy with edge-server collaboration.

Findings

01

Crayon achieves comparable performance to larger models on customized tasks.

02

The hybrid inference strategy improves efficiency and accuracy.

03

Our benchmark demonstrates significant gains over existing methods.

Abstract

The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference· underline

Taxonomy

TopicsDigital Rights Management and Security · Advanced Data Storage Technologies

MethodsBalanced Selection · Adapter