Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

Vladimir Iglovikov; Dmitry Kosarevsky

arXiv:2605.08731·cs.PF·May 21, 2026

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

Vladimir Iglovikov, Dmitry Kosarevsky

PDF

TL;DR

This paper critically evaluates the accuracy of single-thread JPEG decoder benchmarks in predicting ML data loader performance across diverse CPU architectures, revealing significant discrepancies and biases.

Contribution

It introduces a comprehensive benchmarking protocol that challenges existing single-thread evaluations and provides a more accurate assessment of JPEG decoder performance in ML workloads.

Findings

01

Decoder rankings vary significantly across CPU architectures.

02

Worker count impacts performance conclusions differently on Zen 4 and Zen 5.

03

TensorFlow exhibits a large single-thread penalty on ARM.

Abstract

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched 16 vCPU Google Cloud CPUs: Intel Emerald Rapids, AMD Zen 4, AMD Zen 5, ARM Neoverse V2, and ARM Neoverse N1. ImageNet validation is the workload, not a new dataset contribution: each run decodes the full 50,000-image split from memory and reports single-thread throughput for all decoders, PyTorch \texttt{DataLoader} throughput for eligible decoders at worker counts ${0, 2, 4, 8}$ , and decoder skip behavior. The evaluation protocol changes the supported conclusion. On Neoverse V2, \texttt{imageio} is ninth in single-thread throughput yet lands in the top DataLoader tier with \texttt{torchvision}; on Zen 4, \texttt{torchvision} rises from seventh…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.