Micro Language Models Enable Instant Responses

Wen Cheng; Tuochao Chen; Karim Helwani; Sriram Srinivasan; Luke Zettlemoyer; Shyamnath Gollakota

arXiv:2604.19642·cs.CL·April 22, 2026

Micro Language Models Enable Instant Responses

Wen Cheng, Tuochao Chen, Karim Helwani, Sriram Srinivasan, Luke Zettlemoyer, Shyamnath Gollakota

PDF

1 Repo

TL;DR

Micro language models enable instant on-device responses by generating initial words locally and seamlessly collaborating with cloud models for completion, suitable for ultra-resource-constrained devices.

Contribution

Introduction of ultra-compact micro language models that generate initial response segments locally and collaborate with cloud models for completion, reducing latency on edge devices.

Findings

01

Micro models match larger models in useful language generation.

02

Collaborative framework enables seamless mid-sentence handoffs.

03

Empirical results demonstrate effective on-device response initiation.

Abstract

Edge devices such as smartwatches and smart glasses cannot continuously run even the smallest 100M-1B parameter language models due to power and compute constraints, yet cloud inference introduces multi-second latencies that break the illusion of a responsive assistant. We introduce micro language models ( $μ$ LMs): ultra-compact models (8M-30M parameters) that instantly generate the first 4-8 words of a contextually grounded response on-device, while a cloud model completes it; thus, masking the cloud latency. We show that useful language generation survives at this extreme scale with our models matching several 70M-256M-class existing models. We design a collaborative generation framework that reframes the cloud model as a continuator rather than a respondent, achieving seamless mid-sentence handoffs and structured graceful recovery via three error correction methods when the local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sensente/micro_language_model_swen_project
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.