Efficient On-Device Agents via Adaptive Context Management
Sanidhya Vijayvargiya, Rahul Lokesh

TL;DR
This paper introduces a framework for on-device AI agents that efficiently manages limited memory by compressing context and loading tool schemas dynamically, enabling rich interactions without exceeding device constraints.
Contribution
The paper presents a novel context management framework with adaptive memory, schema serialization, and just-in-time loading, significantly reducing context size while maintaining performance.
Findings
Over 6-fold reduction in initial system prompt context
10- to 25-fold reduction in context growth rate
Matches or exceeds baseline performance on complex tasks
Abstract
On-device AI agents offer the potential for personalized, low-latency assistance, but their deployment is fundamentally constrained by limited memory capacity, which restricts usable context. This reduced practical context window creates a trade-off between supporting rich, stateful interactions with complex tool capabilities and maintaining on-device feasibility. We break this trade-off with a framework for context-efficient on-device agents, driven by three synergistic optimizations (1) a dynamic memory system using specialized LoRA adapters to distill conversational history into a compressed, and structured Context State Object; (2) a minimalist serialization format for tool schemas to minimize token overhead per tool; and (3) a just-in-time schema-passing mechanism that loads full tool definitions only upon tool selection. We instantiate this framework by adapting a 3B parameter SLM…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. This paper studies a practical scenarios and can be helpful for the deployment of llm. 2. The proposed method is simple and easy to understanding while being effective.
1. The technical contribution might be minor. The author lacks a comprehensive discussion on existing method and their limitations. Therefore, It is hard to justify the novelty of the proposed method. 2. The author only compares their method to few baselines on a self-curated datasets, which is a weak evaluation. 3. Although the author study on-device memory management, no specific memory parameters and computation resources available for a number of representative devices are provided. Therefor
1. Context constraints on resource-limited devices are a genuine deployment bottleneck overlooked by long-context research. 2. The paper is well written and easy to follow. 3. Empirical results show the proposed method achieves 10-25× context reduction with maintained or improved performance.
1. **Limited Novelty in Context Compression**: Context compression is a mature field with extensive prior work. The proposed Context State Object is essentially task-specific summarization in key-value format, not a fundamentally new compression paradigm. The dual-adapter architecture uses standard techniques. The paper needs to better clarify what makes this approach fundamentally different from applying existing summarization methods to agent conversations, as the primary contribution appears
- Novel Context Management: The CSO system effectively balances memory efficiency and task fidelity by leveraging structured logging and semantic compression. This addresses the critical bottleneck of long-context degradation in on-device settings. - Practical Tool Optimization: The minimalist schema format and JIT mechanism drastically reduce initial token overhead, enabling agents to handle more tools within constrained memory budgets. - Strong Empirical Results: The experiments show significa
- The experiment is conducted on a 3B-parameter xLAM 2. While a brief test on Qwen-3 4B suggests scalability, the results may not extend to other architectures or larger models without retraining. - The JIT schema-passing mechanism introduces 500ms latency per turn for the CSO update cycle on a Galaxy S25 CPU, which would increase latency in time-sensitive applications. - While the paper provides detailed methodology, it is still difficult for researchers to reproduce it and it would be better
The strengths of this paper are as follows: 1. Framework design: The proposed framework is well-thought-out. Each choice of the framework is well supported with reasonable observations/insights. Especially, the choices are highly technically detailed at the low level, while the appropriateness remains clear at the high level. 2. Balanced metric: The authors combine various metrics to examine the efficacy of the proposed method. The combination of precision (i.e., rule-based) and LLM-as-judge (i
The weaknesses/questions/suggestions of this paper are as follows: 1. Reference: There are missing parts of the Appendix that are referenced in the main text (e.g., Appendix A.5 & A.6). 2. Tasks: - In the line 827-828 (Appendix A.2.6): does this mean that prior benchmarks are ill-posed, or the framework is designed in a wrong manner? The “our models will not work” part sounds very unconvincing. Would there be any method to adapt the existing framework (e.g., changing the context part with some
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Personal Information Management and User Behavior · Distributed systems and fault tolerance
