Fine-tuning on simulated data outperforms prompting for agent tone of voice

Ingo Marquardt; Philippe Brule

arXiv:2507.04889·cs.LG·July 8, 2025

Fine-tuning on simulated data outperforms prompting for agent tone of voice

Ingo Marquardt, Philippe Brule

PDF

TL;DR

Fine-tuning small language models on synthetic data significantly outperforms prompting in achieving conversational tone, demonstrating high efficiency and style adherence even with limited data and quantization techniques.

Contribution

This work shows that fine-tuning small, open-weight language models on synthetic data is more effective than prompting for style alignment in conversational applications.

Findings

01

Fine-tuning achieved high conversational response rates.

02

Fine-tuning with 8-bit quantization converged faster.

03

Semantic similarity confirmed content quality was maintained.

Abstract

Deploying language models (LMs) in customer-facing speech applications requires conversational fluency and adherence to specific stylistic guidelines. This can be challenging to achieve reliably using complex system prompts due to issues like instruction following limitations and in-context bias. This study investigates the effectiveness of fine-tuning versus system prompting for aligning LMs with a specific behavioral target: responding in a natural, conversational tone suitable for voice interactions. We fine-tuned a small, open-weights model (`Llama3.2-1B-Instruct`) using Low-Rank Adaptation (LoRA) on a synthetically generated dataset derived from Wikipedia. Additionally, we fine-tuned two closed-source models (`gpt-4o-mini`, `gpt-4.1-mini`). Our results demonstrate that fine-tuning outperformed system prompting, achieving a high percentage of conversational responses, even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.