Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data

Adel ElZemity; Budi Arief; Shujun Li

arXiv:2505.09974·cs.CR·September 18, 2025

Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data

Adel ElZemity, Budi Arief, Shujun Li

PDF

Open Access

TL;DR

This paper investigates safety risks in fine-tuned LLMs using pseudo-malicious data, confirming safety degradation and proposing a rewording approach to improve safety without sacrificing utility.

Contribution

It validates safety concerns through independent evaluation and introduces a novel safety rewording method to mitigate risks in fine-tuned LLMs.

Findings

01

Fine-tuning reduces safety resilience across all tested LLMs.

02

The failure rate of Mistral 7B against prompt injection increased from 9.1% to 68.7%.

03

Rewording instruction-response pairs can improve safety while maintaining utility.

Abstract

Large language models (LLMs) have been used in many application domains, including cyber security. The application of LLMs in the cyber security domain presents significant opportunities, such as for enhancing threat analysis and malware detection, but it can also introduce critical risks and safety concerns, including potential personal data leakage and automated generation of new malware. Building on recent findings that fine-tuning LLMs with pseudo-malicious cyber security data significantly compromises their safety, this paper presents a comprehensive validation and extension of these safety risks using a different evaluation framework. We employ the garak red teaming framework with the OWASP Top 10 for LLM Applications to assess four open-source LLMs: Mistral 7B, Llama 3 8B, Gemma 2 9B, and DeepSeek R1 8B. Our evaluation confirms and extends previous findings, showing that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Information and Cyber Security · Digital and Cyber Forensics

MethodsLLaMA