# An efficient strategy for fine-tuning large language models

**Authors:** Benjamin Marsh, Adam Michaleas, Darrell O. Ricke, Shaun Monera, Shriya Zembruski

PMC · DOI: 10.3389/frai.2026.1665992 · Frontiers in Artificial Intelligence · 2026-02-17

## TL;DR

This paper introduces a strategy to efficiently fine-tune large language models for domain-specific tasks when data and computing resources are limited.

## Contribution

The novel contribution is the Distilling Step-by-Step (DSS) strategy for dataset development and model training with rationale supervision.

## Key findings

- DSS with full-precision fine-tuning provides the best overall performance.
- DSS with LoRA offers a good performance-efficiency tradeoff under resource constraints.
- DSS with QLoRA enables training with limited GPU memory while maintaining competitive performance.

## Abstract

Large Language Models (LLMs) achieve strong performance on many Natural Language Processing tasks, but adapting them to domain-specific applications is resource-intensive due to the cost of curating task-specific datasets and the compute required for fine-tuning. This work proposes an end-to-end strategy for rapidly fine-tuning LLMs for domain-specific tasks when both data and compute are limited.

The strategy uses Distilling Step-by-Step (DSS) for dataset development and model training, where a teacher model generates task labels and intermediate rationales via Chain-of-Thought prompting for a natural-language-to-Query-DSL structured generation task. Using the resulting supervision, we benchmark three fine-tuning modalities through hyperparameter sweeps: full-precision fine-tuning, Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA). To isolate the effect of rationale supervision, we additionally conduct an ablation study comparing DSS training (label + rationale supervision) against a label-only configuration.

Across the evaluated configurations, DSS combined with full-precision fine-tuning yields the strongest overall performance. Under resource constraints, DSS with LoRA provides an effective performance-efficiency tradeoff, and DSS with QLoRA enables training under tighter GPU memory budgets while maintaining competitive performance. In the parameter-efficient regimes, an alpha-to-rank ratio of 4:1 provides a consistent balance of performance and compute consumption across the explored settings.

These findings support a practical process for resource-constrained domain adaptation: use DSS to efficiently construct datasets, then select the fine-tuning modality based on available compute (full-precision when feasible; LoRA or QLoRA when memory-limited). The proposed workflow offers a general guide for efficiently fine-tuning LLMs for domain-specific tasks with limited data availability.

## Full-text entities

- **Diseases:** hallucinations (MESH:D006212), LLMs (MESH:D007806)
- **Chemicals:** DSS (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953369/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953369/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953369/full.md

---
Source: https://tomesphere.com/paper/PMC12953369