Rethinking Inter-Process Communication with Memory Operation Offloading
Misun Park, Richi Dubey, Yifan Yuan, Nam Sung Kim, and Ada Gavrilovska

TL;DR
This paper introduces a unified IPC runtime that effectively integrates hardware and software memory offloading, significantly improving throughput and latency for data-intensive applications.
Contribution
It presents a novel unified IPC runtime model that coordinates hardware and software memory offloading, enabling flexible communication modes and system-wide efficiency improvements.
Findings
Instruction count reduced by up to 22%
Throughput increased by up to 2.1x
Latency decreased by up to 72%
Abstract
As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading, current IPC stacks lack a unified runtime model to coordinate them effectively. This paper presents a unified IPC runtime suite that integrates both hardware- and software-based memory offloading into shared-memory communication. The system characterizes the interaction between offload strategies and IPC execution, including synchronization, cache visibility, and concurrency, and introduces multiple IPC modes that balance throughput, latency, and CPU efficiency. Through asynchronous pipelining, selective cache injection, and hybrid coordination, the system turns offloading from a device-specific feature into a general system capability. Evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Big Data and Digital Economy
