Delving into Differentially Private Transformer

Youlong Ding; Xueyang Wu; Yining Meng; Yonggang Luo; Hao Wang and; Weike Pan

arXiv:2405.18194·cs.LG·August 27, 2024·1 cites

Delving into Differentially Private Transformer

Youlong Ding, Xueyang Wu, Yining Meng, Yonggang Luo, Hao Wang and, Weike Pan

PDF

Open Access 1 Video

TL;DR

This paper introduces a modular approach to training differentially private Transformer models by reducing the problem to DP neural nets and proposing new techniques to address unique challenges.

Contribution

It presents the Re-Attention Mechanism and Phantom Clipping to overcome specific issues in DP Transformer training, advancing the field with a modular methodology.

Findings

01

Identified the attention distraction phenomenon as a key challenge.

02

Proposed Re-Attention Mechanism to mitigate attention distraction.

03

Introduced Phantom Clipping to improve gradient clipping efficiency.

Abstract

Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Delving into Differentially Private Transformer· slideslive

Taxonomy

TopicsAdvancements in Semiconductor Devices and Circuit Design

MethodsLinear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections