Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

Yuval Shemla; Ayal Yakobe; Tanmay Agarwal

arXiv:2605.17774·cs.CL·May 19, 2026

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

Yuval Shemla, Ayal Yakobe, Tanmay Agarwal

PDF

TL;DR

This paper demonstrates that QLoRA fine-tuning can internalize tool knowledge into small language models, reducing prompt length and inference overhead while maintaining or improving planning accuracy.

Contribution

It introduces a method for internalizing tool knowledge in small models via QLoRA fine-tuning, enabling description-free structured planning.

Findings

01

Fine-tuned models outperform baselines with full tool descriptions.

02

Input length reduced by 82.6% while improving planning scores.

03

Qwen3-4B achieves high judge scores with less memory and faster inference.

Abstract

Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting the practicality of smaller models. This paper investigates whether tool-use knowledge can be internalized into small language models through parameter-efficient fine-tuning, enabling structured planning without explicit tool descriptions at inference time. Using AssetOpsBench as the primary benchmark, we fine-tune Gemma 4 E4B and Qwen3-4B with 8-bit QLoRA on approximately 1,700 tool-use examples spanning tool knowledge, question-to-plan mappings, and execution-style traces. We evaluate the resulting models under description-free inference, where the prompt omits the tool catalog entirely. The fine-tuned models outperform an informed unfine-tuned baseline that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.