Cold-RL: Learning Cache Eviction with Offline Reinforcement Learning for NGINX
Aayush Gupta, Arpit Bhayani

TL;DR
Cold-RL introduces a reinforcement learning-based cache eviction policy for NGINX that significantly improves hit ratios over traditional methods while maintaining strict latency constraints.
Contribution
This paper presents the first reinforcement learning eviction policy integrated into NGINX with strict latency and SLO guarantees, replacing traditional heuristics.
Findings
Cold-RL improves hit ratio by up to 146% over classical baselines.
Inference overhead remains below 2% CPU, with eviction latency within strict limits.
Cold-RL matches classical methods at larger cache sizes, demonstrating scalability.
Abstract
Web proxies such as NGINX commonly rely on least-recently-used (LRU) eviction, which is size agnostic and can thrash under periodic bursts and mixed object sizes. We introduce Cold-RL, a learned eviction policy for NGINX that replaces LRU's forced-expire path with a dueling Deep Q-Network served by an ONNX sidecar within a strict microsecond budget. On each eviction, Cold-RL samples the K least-recently-used objects, extracts six lightweight features (age, size, hit count, inter-arrival time, remaining TTL, and last origin RTT), and requests a bitmask of victims; a hard timeout of 500 microseconds triggers immediate fallback to native LRU. Policies are trained offline by replaying NGINX access logs through a cache simulator with a simple reward: a retained object earns one point if it is hit again before TTL expiry. We compare against LRU, LFU, size-based, adaptive LRU, and a hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Caching and Content Delivery
