TranSQL+: Serving Large Language Models with SQL on Low-Resource Hardware
Wenbo Sun, Qiming Guo, Wenlu Wang, Rihan Hai

TL;DR
This paper presents TranSQL+, a method to run large language models efficiently on low-resource hardware by translating their computation graphs into SQL queries, enabling faster inference without external libraries.
Contribution
Introducing TranSQL+, a novel template-based code generator that converts LLM graphs into SQL for resource-efficient inference on relational databases.
Findings
Achieves up to 20x lower prefill latency
Attains 4x higher decoding speed
Effective on low-memory, CPU-only hardware
Abstract
Deploying Large Language Models (LLMs) on resource-constrained devices remains challenging due to limited memory, lack of GPUs, and the complexity of existing runtimes. In this paper, we introduce TranSQL+, a template-based code generator that translates LLM computation graphs into pure SQL queries for execution in relational databases. Without relying on external libraries, TranSQL+, leverages mature database features, such as vectorized execution and out-of-core processing, for efficient inference. We further propose a row-to-column (ROW2COL) optimization that improves join efficiency in matrix operations. Evaluated on Llama3-8B and DeepSeekMoE models, TranSQL+ achieves up to 20x lower prefill latency and 4x higher decoding speed compared to DeepSpeed Inference and Llama.cpp in low-memory and CPU-only configurations. Our results highlight relational databases as a practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Library Science and Information Systems · Digital Rights Management and Security
MethodsSoftmax · Attention Is All You Need
