AgentCgroup: Understanding and Controlling OS Resources of AI Agents
Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn

TL;DR
This paper systematically characterizes OS resource dynamics in sandboxed AI agents, revealing bottlenecks and unpredictability, and proposes AgentCgroup, an eBPF-based controller that improves resource management and isolation.
Contribution
It introduces AgentCgroup, a novel intent-driven resource control system tailored for AI agents, addressing mismatches in existing resource management approaches.
Findings
OS execution accounts for 56-74% of task latency
Memory is the main concurrency bottleneck
Resource demands are highly unpredictable
Abstract
AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuations. We present a systematic characterization of OS-level resource dynamics in sandboxed AI coding agents, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM models. Our measurements reveal that (1) OS-level execution (tool calls, container and agent initialization) accounts for 56-74% of end-to-end task latency; (2) memory, not CPU, is the concurrency bottleneck; (3) memory spikes are tool-call-driven with a up to 15.4x peak-to-average ratio; and (4) resource demands are highly unpredictable across tasks, runs, and models. Comparing these characteristics against serverless, microservice, and batch workloads, we identify three mismatches in existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Advanced Software Engineering Methodologies
