OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models

Seungwoo Roh; Huiyeong Kim; Jong-Chan Kim

arXiv:2605.11678·cs.AI·May 13, 2026

OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models

Seungwoo Roh, Huiyeong Kim, Jong-Chan Kim

PDF

TL;DR

This paper introduces a system-level framework that enables memory-efficient inference for vision-language-action models on GPUs with limited VRAM, without modifying the models themselves.

Contribution

It proposes a three-stage memory management framework and a performance prediction model to optimize GPU memory usage and inference speed for large models.

Findings

01

Achieves up to 3.55x speedup over existing offloading methods.

02

Reduces VRAM usage from model-level to layer-level granularity.

03

Maintains full BF16 precision during inference.

Abstract

End-to-end Vision-Language-Action (VLA) models for autonomous driving unify perception, reasoning, and control in a single neural network, achieving strong driving performance but requiring 20-60GB of GPU memory-far exceeding the 12-16GB available on commodity GPUs. We present a framework, which enables memory-efficient VLA inference on VRAM-constrained GPUs through system-level optimization alone, without model modification. Our work proceeds in three stages: (1) Sequential Demand Layering reduces VRAM usage from model-level to layer-level granularity; (2) Pipelined Demand Layering hides parameter transfer time within layer execution time via transfer--compute overlap; and (3) a GPU-Resident Layer Decision Policy, informed by per-module residency benefit analysis, eliminates the residual transfer overhead that pipelining cannot hide. We further propose a performance prediction model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.