Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy   and Complexity

Mengjiao Zhang; Jia Xu

arXiv:2410.16410·cs.AI·October 23, 2024

Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity

Mengjiao Zhang, Jia Xu

PDF

Open Access

TL;DR

This paper introduces Subword Embedding from Bytes (SEB), a method that enhances privacy in NLP models by encoding subwords as byte sequences, maintaining accuracy and efficiency while resisting embedding-based privacy attacks.

Contribution

SEB is a novel subword embedding approach that uses byte sequences to improve privacy protection without sacrificing model performance or efficiency.

Findings

01

SEB effectively prevents recovery of original sentences in federated learning.

02

SEB achieves comparable or better accuracy than standard methods in NLP tasks.

03

SEB reduces memory and computational complexity compared to traditional subword embeddings.

Abstract

While NLP models significantly impact our lives, there are rising concerns about privacy invasion. Although federated learning enhances privacy, attackers may recover private training data by exploiting model parameters and gradients. Therefore, protecting against such embedding attacks remains an open challenge. To address this, we propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using deep neural networks, making input text recovery harder. Importantly, our method requires a smaller memory with $256$ bytes of vocabulary while keeping efficiency with the same input length. Thus, our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. Our experiments show SEB can effectively protect against embedding-based attacks from recovering original sentences in federated learning. Meanwhile, we verify that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques