The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

Marco Graziano

arXiv:2603.10030·cs.AR·March 27, 2026

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

Marco Graziano

PDF

Open Access

TL;DR

This paper introduces dmaplane, a Linux kernel module that provides explicit buffer orchestration for high-performance AI data paths, improving efficiency and safety in buffer management across devices and nodes.

Contribution

The paper presents dmaplane, a novel kernel-level framework that manages buffer lifecycle, sharing, and flow control for AI data transport, addressing gaps in existing transport libraries.

Findings

01

Improves cross-node NUMA performance penalties

02

Enables safe, high-throughput RDMA data transfers

03

Supports GPU memory integration with PCIe BAR pinning

Abstract

AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. This paper presents dmaplane, a Linux kernel module that makes this missing layer explicit as buffer orchestration. dmaplane exposes a stable kernel UAPI via /dev/dmaplane and composes ring-based command channels, DMA buffer lifecycle management, dma-buf export for cross-device sharing, a kernel-space RDMA engine, NUMA-aware allocation and verification, credit-based flow control, low-overhead observability, and GPU memory integration via PCIe BAR pinning. We evaluate orchestration sensitivity with measurements of NUMA cross-node penalties at DRAM scale, completion-safe flow control under sustained RDMA load, and GPU BAR mapping tiers versus cudaMemcpy. We also demonstrate end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Security and Verification in Computing · Advanced Data Storage Technologies