Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning
Yuval Shemla, Ayal Yakobe, Tanmay Agarwal

TL;DR
This paper demonstrates that QLoRA fine-tuning can internalize tool knowledge into small language models, reducing prompt length and inference overhead while maintaining or improving planning accuracy.
Contribution
It introduces a method for internalizing tool knowledge in small models via QLoRA fine-tuning, enabling description-free structured planning.
Findings
Fine-tuned models outperform baselines with full tool descriptions.
Input length reduced by 82.6% while improving planning scores.
Qwen3-4B achieves high judge scores with less memory and faster inference.
Abstract
Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting the practicality of smaller models. This paper investigates whether tool-use knowledge can be internalized into small language models through parameter-efficient fine-tuning, enabling structured planning without explicit tool descriptions at inference time. Using AssetOpsBench as the primary benchmark, we fine-tune Gemma 4 E4B and Qwen3-4B with 8-bit QLoRA on approximately 1,700 tool-use examples spanning tool knowledge, question-to-plan mappings, and execution-style traces. We evaluate the resulting models under description-free inference, where the prompt omits the tool catalog entirely. The fine-tuned models outperform an informed unfine-tuned baseline that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
