SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

Noor Islam S. Mohammad; Ulu\u{g} Bayaz{\i}t

arXiv:2604.16606·cs.CR·April 21, 2026

SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

Noor Islam S. Mohammad, Ulu\u{g} Bayaz{\i}t

PDF

TL;DR

SafeLM is a comprehensive framework that enhances the safety, privacy, and robustness of large language models through federated training, encryption, defenses, and alignment techniques, achieving high accuracy and efficiency.

Contribution

It introduces a unified approach combining multiple safety and privacy techniques into a single framework for trustworthy federated LLMs, with extensive empirical validation.

Findings

01

Achieves 98.0% harmful content detection accuracy.

02

Reduces communication overhead by 96.9%.

03

Lowers gradient inversion PSNR from 31.7 dB to 15.1 dB.

Abstract

Large language models (LLMs) are increasingly deployed in high-stakes domains, yet a unified treatment of their overlapping safety challenges remains lacking. We present SafeLM, a framework that jointly addresses four pillars of LLM safety: privacy, security, misinformation, and adversarial robustness. SafeLM combines federated training with gradient smartification and Paillier encryption for privacy, integrates defenses against training and inference-time attacks, employs contrastive grounding with calibrated decoding to reduce hallucinations, and introduces alignment-aware binarized aggregation to enhance robustness while maintaining bounded reconstruction quality. Across benchmarks on factuality, toxicity, and membership inference, SafeLM achieves 98.0% harmful content detection accuracy, reduces communication by 96.9%, and lowers gradient inversion PSNR from 31.7 dB to 15.1 dB.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.