HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads
Justice Owusu Agyemang, Jerry John Kponyo, Obed Kwasi Somuah, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum

TL;DR
HIVEMIND is an OS-inspired proxy that manages concurrent LLM agent workloads to prevent resource contention failures, significantly reducing errors and wasted compute without modifying existing agent code.
Contribution
The paper introduces HIVEMIND, a novel proxy applying OS-like scheduling primitives to coordinate LLM API calls, improving reliability and efficiency in multi-agent environments.
Findings
Failure rates drop from 72-100% to 0-18% with HIVEMIND.
HIVEMIND reduces wasted compute by up to 100%.
Overhead per request is under 3ms, confirming efficiency.
Abstract
When multiple LLM coding agents share a rate-limited API endpoint, they exhibit resource contention patterns analogous to unscheduled OS processes competing for CPU, memory, and I/O. In a motivating incident, 3 of 11 parallel agents died from connection resets and HTTP 502 errors - a 27% failure rate - despite the API having sufficient aggregate capacity to serve all 11 sequentially. We present HIVEMIND, a transparent HTTP proxy that applies five OS-inspired scheduling primitives - admission control, rate-limit tracking, AIMD backpressure with circuit breaking, token budget management, and priority queuing - to eliminate the failure modes caused by uncoordinated parallel execution. The proxy requires zero modifications to existing agent code and supports Anthropic, OpenAI, and local model APIs via auto-detected provider profiles. Our evaluation across seven scenarios (5-50 concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
