GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation
Kathiravan Palaniappan

TL;DR
This paper examines the scalability and saturation limits of CPU-based deep learning inference, comparing legacy and modern hardware, and introduces GDEV-AI, a benchmarking framework for analyzing performance bottlenecks.
Contribution
It provides a comprehensive empirical analysis of CPU inference scalability and introduces GDEV-AI for reproducible benchmarking of architectural saturation effects.
Findings
Legacy CPUs saturate quickly at small batch sizes due to instruction and memory limits.
Modern CPUs with AMX achieve higher throughput but face contention issues at oversubscription.
GDEV-AI offers a vendor-neutral tool for analyzing inference scalability and bottlenecks.
Abstract
The deployment of deep learning inference in production environments continues to grow, where throughput, latency, and hardware efficiency are critical. Although specialized accelerators are increasingly adopted, many inference workloads still run on CPU-only systems, particularly in legacy data centers and cost-sensitive environments. This study investigates the scalability limits of CPU-based inference for convolutional neural networks by benchmarking ResNet models across varying batch sizes on two hardware tiers: a legacy Intel Xeon E5-2403 v2 processor and a modern Intel Xeon 6 "Granite Rapids" platform. Results show that legacy CPUs quickly reach throughput saturation, with limited scaling beyond small batch sizes due to instruction-level and memory constraints. In contrast, the Granite Rapids system leverages Intel Advanced Matrix Extensions (AMX) to achieve substantially higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing
