TL;DR
WebLLM is an open-source JavaScript framework enabling high-performance large language model inference directly within web browsers, leveraging WebGPU and WebAssembly for efficient computation.
Contribution
It introduces WebLLM, a novel browser-based LLM inference engine that achieves near-native performance using WebGPU and machine learning compilers.
Findings
WebLLM retains up to 80% of native performance on the same device.
It provides an OpenAI-style API for easy integration into web applications.
The code is available at https://github.com/mlc-ai/web-llm.
Abstract
Advancements in large language models (LLMs) have unlocked remarkable capabilities. While deploying these models typically requires server-grade GPUs and cloud-based inference, the recent emergence of smaller open-source models and increasingly powerful consumer devices have made on-device deployment practical. The web browser as a platform for on-device deployment is universally accessible, provides a natural agentic environment, and conveniently abstracts out the different backends from diverse device vendors. To address this opportunity, we introduce WebLLM, an open-source JavaScript framework that enables high-performance LLM inference entirely within web browsers. WebLLM provides an OpenAI-style API for seamless integration into web applications, and leverages WebGPU for efficient local GPU acceleration and WebAssembly for performant CPU computation. With machine learning compilers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
