Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Mengting He; Shihao Xia; Haomin Jia; Wenfei Wu; Linhai Song

arXiv:2603.24595·cs.PL·March 27, 2026

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Mengting He, Shihao Xia, Haomin Jia, Wenfei Wu, Linhai Song

PDF

Open Access

TL;DR

Model2Kernel is a novel system that uses model-aware dynamic analysis and symbolic execution to automatically detect memory-safety bugs in CUDA kernels used for large language model inference, improving reliability and security.

Contribution

It introduces the first practical, model-aware symbolic execution framework tailored for CUDA kernels in LLM inference, addressing limitations of prior methods.

Findings

01

Discovered 353 previously unknown bugs in CUDA kernels.

02

Achieved only nine false positives in bug detection.

03

Effectively verified memory safety in real-world LLM inference systems.

Abstract

The widespread adoption of large language models (LLMs) has made GPU-accelerated inference a critical part of modern computing infrastructure. Production inference systems rely on CUDA kernels to implement core transformer operations, yet these kernels are highly susceptible to memory-safety bugs due to model-dependent tensor layouts, intricate memory indexing, and massive thread-level parallelism. Such bugs can corrupt model weights, crash inference services, or even enable adversarial attacks. Existing techniques either depend on unavailable hardware, incur high overhead, or fail to handle kernel inputs with variable lengths, and none can effectively detect CUDA memory bugs in LLM inference systems. This paper presents Model2Kernel, the first practical system for automatically verifying the memory safety of CUDA kernels used in LLM inference. Model2Kernel performs model-aware dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Parallel Computing and Optimization Techniques · Adversarial Robustness in Machine Learning