Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei, Wang

TL;DR
This paper introduces the Masked Jigsaw Puzzle (MJP) position embedding for Vision Transformers, which enhances performance and robustness while significantly improving privacy protection against gradient inversion attacks.
Contribution
The paper proposes a novel MJP position embedding method that shuffles patches and occludes their position info, balancing accuracy, robustness, and privacy in Vision Transformers.
Findings
MJP encodes spatial relationships but reduces privacy leakage.
Shuffling patches with MJP improves robustness and privacy.
MJP enhances accuracy on large-scale datasets under certain conditions.
Abstract
Position Embeddings (PEs), an arguably indispensable component in Vision Transformers (ViTs), have been shown to improve the performance of ViTs on many vision tasks. However, PEs have a potentially high risk of privacy leakage since the spatial information of the input patches is exposed. This caveat naturally raises a series of interesting questions about the impact of PEs on the accuracy, privacy, prediction consistency, etc. To tackle these issues, we propose a Masked Jigsaw Puzzle (MJP) position embedding method. In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded. Meanwhile, for the non-occluded patches, the PEs remain the original ones but their spatial relation is strengthened via our dense absolute localization regressor. The experimental results reveal that 1) PEs explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Retinal Imaging and Analysis
MethodsJigsaw
