Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation
Xinguo Feng, Zhongkui Ma, Zihan Wang, Alsharif Abuadbba, Guangdong Bai

TL;DR
This paper introduces GHOST, a token obfuscation method that significantly reduces gradient inversion attack success while maintaining model utility across various architectures and tasks.
Contribution
GHOST is a novel token-level obfuscation technique that disconnects semantic links in token space to defend against gradient inversion attacks without sacrificing model performance.
Findings
GHOST reduces data recovery rate to as low as 1%.
Maintains high utility with up to 0.92 F1 score and 5.45 perplexity.
Effective across diverse models and attack scenarios.
Abstract
Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to gradient inversion attacks (GIAs), which allow adversaries to reconstruct private training data from shared gradients. Existing defenses mainly employ gradient perturbation techniques, e.g., noise injection or gradient pruning, to disrupt GIAs' direct mapping from gradient space to token space. However, these methods often fall short due to the retention of semantics similarity across gradient, embedding, and token spaces. In this work, we propose a novel defense mechanism named GHOST (gradient shield with obfuscated tokens), a token-level obfuscation mechanism that neutralizes GIAs by decoupling the inherent connections across gradient, embedding, and token spaces. GHOST is built upon an important insight: due to the large scale of the token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Topic Modeling
