Fine-tuning on simulated data outperforms prompting for agent tone of voice
Ingo Marquardt, Philippe Brule

TL;DR
Fine-tuning small language models on synthetic data significantly outperforms prompting in achieving conversational tone, demonstrating high efficiency and style adherence even with limited data and quantization techniques.
Contribution
This work shows that fine-tuning small, open-weight language models on synthetic data is more effective than prompting for style alignment in conversational applications.
Findings
Fine-tuning achieved high conversational response rates.
Fine-tuning with 8-bit quantization converged faster.
Semantic similarity confirmed content quality was maintained.
Abstract
Deploying language models (LMs) in customer-facing speech applications requires conversational fluency and adherence to specific stylistic guidelines. This can be challenging to achieve reliably using complex system prompts due to issues like instruction following limitations and in-context bias. This study investigates the effectiveness of fine-tuning versus system prompting for aligning LMs with a specific behavioral target: responding in a natural, conversational tone suitable for voice interactions. We fine-tuned a small, open-weights model (`Llama3.2-1B-Instruct`) using Low-Rank Adaptation (LoRA) on a synthetically generated dataset derived from Wikipedia. Additionally, we fine-tuned two closed-source models (`gpt-4o-mini`, `gpt-4.1-mini`). Our results demonstrate that fine-tuning outperformed system prompting, achieving a high percentage of conversational responses, even when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
