Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

TL;DR
This paper presents a theoretical framework and algorithm to recover input data from transformer attention weights and outputs, highlighting potential security vulnerabilities in transformer models.
Contribution
It introduces a novel method to reconstruct input data from attention mechanisms, revealing privacy risks in transformer architectures.
Findings
Input data can be recovered from attention weights and outputs.
Transformers may have inherent vulnerabilities exposing sensitive data.
Implications for improving transformer security and privacy protections.
Abstract
In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data from given attention weights and output by minimizing the loss function . This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Digital and Cyber Forensics
