AgentCgroup: Understanding and Controlling OS Resources of AI Agents

Yusheng Zheng; Jiakun Fan; Quanzhi Fu; Yiwei Yang; Wei Zhang; Andi Quinn

arXiv:2602.09345·cs.OS·February 24, 2026

AgentCgroup: Understanding and Controlling OS Resources of AI Agents

Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn

PDF

Open Access

TL;DR

This paper systematically characterizes OS resource dynamics in sandboxed AI agents, revealing bottlenecks and unpredictability, and proposes AgentCgroup, an eBPF-based controller that improves resource management and isolation.

Contribution

It introduces AgentCgroup, a novel intent-driven resource control system tailored for AI agents, addressing mismatches in existing resource management approaches.

Findings

01

OS execution accounts for 56-74% of task latency

02

Memory is the main concurrency bottleneck

03

Resource demands are highly unpredictable

Abstract

AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuations. We present a systematic characterization of OS-level resource dynamics in sandboxed AI coding agents, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM models. Our measurements reveal that (1) OS-level execution (tool calls, container and agent initialization) accounts for 56-74% of end-to-end task latency; (2) memory, not CPU, is the concurrency bottleneck; (3) memory spikes are tool-call-driven with a up to 15.4x peak-to-average ratio; and (4) resource demands are highly unpredictable across tasks, runs, and models. Comparing these characteristics against serverless, microservice, and batch workloads, we identify three mismatches in existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Advanced Software Engineering Methodologies