Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
William Oliveira

TL;DR
This paper examines the engineering challenges of integrating small language models into mobile apps, highlighting failure categories, mitigation strategies, and practical heuristics for reliable on-device AI experiences.
Contribution
It provides a detailed case study of on-device SLM integration in a mobile game, identifying failure modes and proposing effective engineering solutions.
Findings
On-device SLMs can be used in production with careful engineering.
Mitigation strategies include defensive parsing and fallback mechanisms.
The most reliable feature is when the LLM generates minimal output.
Abstract
On-device Small Language Models (SLMs) promise fully offline, private AI experiences for mobile users (no cloud dependency, no data leaving the device). But is this promise achievable in practice? This paper presents a longitudinal practitioner case study documenting the engineering challenges of integrating SLMs (Gemma 4 E2B, 2.6B parameters; Qwen3 0.6B, 600M parameters) into Palabrita, a production Android word-guessing game. Over a 5-day development sprint comprising 204 commits (~90 directly AI-related), the system underwent a radical transformation: from an ambitious design where the LLM generated complete structured puzzles (word, category, difficulty, and five hints as JSON) to a pragmatic architecture where curated word lists provide the words and the LLM generates only three short hints, with a deterministic fallback if it fails. We identify five categories of failures specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
