Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott, Linderman, James Zou, Christopher Re

TL;DR
This paper explores a cost-effective collaboration framework between small on-device language models and large cloud models, achieving significant cost savings while maintaining high performance on complex reasoning tasks.
Contribution
It introduces MinionS, a novel protocol that decomposes tasks into subtasks for local execution, reducing cloud costs with minimal performance loss.
Findings
Naive protocol reduces remote costs by 30.4x, with 87% performance recovery.
MinionS reduces costs by 5.7x, recovering 97.9% of remote model performance.
Key design choices significantly impact the cost-performance trade-off.
Abstract
We investigate an emerging setup in which a small, on-device language model (LM) with access to local data communicates with a frontier, cloud-hosted LM to solve real-world tasks involving financial, medical, and scientific reasoning over long documents. Can a local-remote collaboration reduce cloud inference costs while preserving quality? First, we consider a naive collaboration protocol where the local and remote models simply chat back and forth. Because only the local model reads the full context, this protocol achieves a 30.4x reduction in remote costs, but recovers only 87% of the performance of the frontier model. We identify two key limitations of this protocol: the local model struggles to (1) follow the remote model's multi-step instructions and (2) reason over long contexts. Motivated by these observations, we study an extension of this protocol, coined MinionS, in which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Big Data and Digital Economy
