Bring the BitCODE -- Moving Compute and Data in Distributed Heterogeneous Systems
Wenbin Lu (1), Luis E. Pe\~na (2), Pavel Shamis (2), Valentin Churavy, (3), Barbara Chapman (1), Steve Poole (4) ((1) Stony Brook University, (2), Arm Research, (3) MIT, (4) Los Alamos National Laboratory)

TL;DR
This paper introduces a framework that enables moving compute and data across distributed heterogeneous systems using LLVM and UCX, supporting dynamic code optimization, propagation, and integration with high-level languages like Julia.
Contribution
It presents a novel architecture and implementation for distributed heterogeneous computing, including a new class of X-RDMA operations and integration with modern programming languages.
Findings
X-RDMA pointer chase outperforms RDMA GET by 70%
Framework supports dynamic code propagation and optimization
Enables high-performance distributed computing in heterogeneous systems
Abstract
In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX communication framework. The framework can generate binary machine code or LLVM bitcode for multiple CPU architectures and move the code to remote machines while dynamically optimizing and linking the code on the target platform. The remotely injected code can recursively propagate itself to other remote machines or generate new code. The goal of this paper is threefold: (a) to present an architecture and implementation of the framework that provides essential infrastructure to program a new class of disaggregated systems wherein heterogeneous programming elements such as compute nodes and data processing units (DPUs) are distributed across the system, (b) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
