SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

Shashank Kapadia; Deep Narayan Mishra; Sujal Reddy Alugubelli; Ajay Kumar; Swapnil Yadav; Rishi Bhatia

arXiv:2605.01060·cs.DC·May 5, 2026

SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

Shashank Kapadia, Deep Narayan Mishra, Sujal Reddy Alugubelli, Ajay Kumar, Swapnil Yadav, Rishi Bhatia

PDF

TL;DR

SURGE is a GPU encoding system that efficiently processes large-scale, partitioned text data, achieving high throughput with reduced memory and fault tolerance, validated on real-world datasets.

Contribution

It introduces a cost model, memory-safety bounds, and a decision framework for resource-efficient GPU encoding of partitioned data, enabling faster and more memory-efficient processing.

Findings

01

Achieves 26,413 texts/sec on 10M texts with 4 GPUs.

02

Uses 12.6× less memory than fixed-batch methods.

03

Provides 68× faster time-to-first-output and crash recovery.

Abstract

We present SURGE, a streaming GPU encoding system deployed in production to generate embeddings for over 800 million texts across 40,000 logical partitions. Production embedding pipelines face a tension between logical data partitioning and efficient GPU utilization: processing each partition independently incurs $P$ inter-process communication (IPC) calls whose overhead limits throughput for compute-light models. Our contributions are analytical: (i) a cost model (Theorem 1) predicting throughput within 2% across three encoders spanning a 15 $\times$ parameter range; (ii) a memory-safety bound (Lemma 3) enabling a streaming two-threshold policy with peak memory $O (B_{m i n} + n_{m a x})$ rather than $O (N)$ ; and (iii) a $ϕ$ /CV decision framework characterizing when the pattern applies beyond our workload. The naive fix of batching at fixed size requires $O (N)$ peak memory (32.7 GB at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.