SafeTune: Search-based Harmfulness Minimisation for Large Language Models

Giordano d'Aloisio; David Williams; Giusy Annunziata; Zhiwei Fei; Antinisca Di Marco; Federica Sarro

arXiv:2605.07709·cs.SE·May 11, 2026

SafeTune: Search-based Harmfulness Minimisation for Large Language Models

Giordano d'Aloisio, David Williams, Giusy Annunziata, Zhiwei Fei, Antinisca Di Marco, Federica Sarro

PDF

TL;DR

This paper introduces SafeTune, a search-based method to reduce harmful responses of large language models while enhancing relevance, using hyperparameter tuning and prompt engineering, with promising initial results.

Contribution

SafeTune is a novel multi-objective search approach that effectively minimizes harmfulness and improves relevance in LLM responses through hyperparameter and prompt adjustments.

Findings

01

SafeTune significantly reduces harmful responses in Qwen3.5 0.8B.

02

SafeTune increases prompt-response relevance with a large effect size.

03

Greater repetition in responses is most impactful for harm reduction and relevance.

Abstract

The widespread adoption of Large Language Models (LLMs) raises concerns about the potential harmfulness of their responses. In this paper, we first investigate the harmfulness of responses from four general-purpose LLMs. Next, we propose SafeTune, a multi-objective search-based approach to mitigate harmfulness while increasing response relevance through hyperparameter tuning and system prompt engineering. Our initial evaluation shows that SafeTune significantly reduces the rate of harmful responses generated by Qwen3.5 0.8B and increases prompt-response relevance (both with a large effect size). Among the parameters we explore, we also find that encouraging greater repetition in responses is most impactful in reducing harmfulness while increasing relevance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.