CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs
Shashie Dilhara Batan Arachchige, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Dinusha Vatsalan, Dali Kaafar

TL;DR
This paper introduces CTIGuardian, a few-shot privacy alignment framework that mitigates information leakage in fine-tuned LLMs, balancing privacy protection with utility, demonstrated in cyber threat intelligence tasks.
Contribution
The paper proposes a novel privacy alignment approach using few-shot supervision with LLMs to prevent sensitive data leakage in fine-tuned models, avoiding costly retraining.
Findings
CTIGuardian outperforms NER baselines in privacy-utility trade-offs.
Effective privacy mitigation demonstrated on CTI use case.
Framework is adaptable to other sensitive domains.
Abstract
Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber threat intelligence (CTI). Fine-tuning is mostly done through proprietary datasets that may contain sensitive information. Owners expect their fine-tuned model to not inadvertently leak this information to potentially adversarial end users. Using CTI as a use case, we demonstrate that data-extraction attacks can recover sensitive information from fine-tuned models on CTI reports, underscoring the need for mitigation. Retraining the full model to eliminate this leakage is computationally expensive and impractical. We propose an alternative approach, which we call privacy alignment, inspired by safety alignment in LLMs. Just like safety alignment teaches the model to abide by safety constraints through a few examples, we enforce privacy alignment through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
