Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity
Mengjiao Zhang, Jia Xu

TL;DR
This paper introduces Subword Embedding from Bytes (SEB), a method that enhances privacy in NLP models by encoding subwords as byte sequences, maintaining accuracy and efficiency while resisting embedding-based privacy attacks.
Contribution
SEB is a novel subword embedding approach that uses byte sequences to improve privacy protection without sacrificing model performance or efficiency.
Findings
SEB effectively prevents recovery of original sentences in federated learning.
SEB achieves comparable or better accuracy than standard methods in NLP tasks.
SEB reduces memory and computational complexity compared to traditional subword embeddings.
Abstract
While NLP models significantly impact our lives, there are rising concerns about privacy invasion. Although federated learning enhances privacy, attackers may recover private training data by exploiting model parameters and gradients. Therefore, protecting against such embedding attacks remains an open challenge. To address this, we propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using deep neural networks, making input text recovery harder. Importantly, our method requires a smaller memory with bytes of vocabulary while keeping efficiency with the same input length. Thus, our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. Our experiments show SEB can effectively protect against embedding-based attacks from recovering original sentences in federated learning. Meanwhile, we verify that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques
