WebLLM: A High-Performance In-Browser LLM Inference Engine

Charlie F. Ruan; Yucheng Qin; Akaash R. Parthasarathy; Xun Zhou; Ruihang Lai; Hongyi Jin; Yixin Dong; Bohan Hou; Meng-Shiun Yu; Yiyan Zhai; Sudeep Agarwal; Hangrui Cao; Siyuan Feng; Tianqi Chen

arXiv:2412.15803·cs.LG·April 14, 2026

WebLLM: A High-Performance In-Browser LLM Inference Engine

Charlie F. Ruan, Yucheng Qin, Akaash R. Parthasarathy, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen

PDF

1 Repo

TL;DR

WebLLM is an open-source JavaScript framework enabling high-performance large language model inference directly within web browsers, leveraging WebGPU and WebAssembly for efficient computation.

Contribution

It introduces WebLLM, a novel browser-based LLM inference engine that achieves near-native performance using WebGPU and machine learning compilers.

Findings

01

WebLLM retains up to 80% of native performance on the same device.

02

It provides an OpenAI-style API for easy integration into web applications.

03

The code is available at https://github.com/mlc-ai/web-llm.

Abstract

Advancements in large language models (LLMs) have unlocked remarkable capabilities. While deploying these models typically requires server-grade GPUs and cloud-based inference, the recent emergence of smaller open-source models and increasingly powerful consumer devices have made on-device deployment practical. The web browser as a platform for on-device deployment is universally accessible, provides a natural agentic environment, and conveniently abstracts out the different backends from diverse device vendors. To address this opportunity, we introduce WebLLM, an open-source JavaScript framework that enables high-performance LLM inference entirely within web browsers. WebLLM provides an OpenAI-style API for seamless integration into web applications, and leverages WebGPU for efficient local GPU acceleration and WebAssembly for performant CPU computation. With machine learning compilers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlc-ai/web-llm
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.