Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints
Evan Chen, Wenzhi Fang, Shiqiang Wang, Christopher Brinton

TL;DR
This paper introduces DA-GRPO, a novel reinforcement learning method that enables local language models to learn task competence and collaborate with cloud models efficiently under strict budget constraints, improving accuracy and reducing forgetting.
Contribution
The paper presents DA-GRPO, a new policy optimization algorithm that directly incorporates cloud-usage constraints into continual learning of local language models.
Findings
DA-GRPO improves accuracy after task switches.
It significantly reduces catastrophic forgetting.
It maintains stable cloud offloading within budget constraints.
Abstract
Locally deployed Small Language Models (SLMs) must continually support diverse tasks under strict memory and computation constraints, making selective reliance on cloud Large Language Models (LLMs) unavoidable. Regulating cloud assistance during continual learning is challenging, as naive reward-based reinforcement learning often yields unstable offloading behavior and exacerbates catastrophic forgetting as task distributions shift. We propose DA-GRPO, a dual-advantage extension of Group Relative Policy Optimization that incorporates cloud-usage constraints directly into advantage computation, avoiding fixed reward shaping and external routing models. This design enables the local model to jointly learn task competence and collaboration behavior, allowing cloud requests to emerge naturally during post-training while respecting a prescribed assistance budget. Experiments on mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
