Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels
Mengting He, Shihao Xia, Haomin Jia, Wenfei Wu, Linhai Song

TL;DR
Model2Kernel is a novel system that uses model-aware dynamic analysis and symbolic execution to automatically detect memory-safety bugs in CUDA kernels used for large language model inference, improving reliability and security.
Contribution
It introduces the first practical, model-aware symbolic execution framework tailored for CUDA kernels in LLM inference, addressing limitations of prior methods.
Findings
Discovered 353 previously unknown bugs in CUDA kernels.
Achieved only nine false positives in bug detection.
Effectively verified memory safety in real-world LLM inference systems.
Abstract
The widespread adoption of large language models (LLMs) has made GPU-accelerated inference a critical part of modern computing infrastructure. Production inference systems rely on CUDA kernels to implement core transformer operations, yet these kernels are highly susceptible to memory-safety bugs due to model-dependent tensor layouts, intricate memory indexing, and massive thread-level parallelism. Such bugs can corrupt model weights, crash inference services, or even enable adversarial attacks. Existing techniques either depend on unavailable hardware, incur high overhead, or fail to handle kernel inputs with variable lengths, and none can effectively detect CUDA memory bugs in LLM inference systems. This paper presents Model2Kernel, the first practical system for automatically verifying the memory safety of CUDA kernels used in LLM inference. Model2Kernel performs model-aware dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Parallel Computing and Optimization Techniques · Adversarial Robustness in Machine Learning
