Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision   Transformers

Bin Ren; Yahui Liu; Yue Song; Wei Bi; Rita Cucchiara; Nicu Sebe; Wei; Wang

arXiv:2205.12551·cs.CV·May 29, 2023

Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei, Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Masked Jigsaw Puzzle (MJP) position embedding for Vision Transformers, which enhances performance and robustness while significantly improving privacy protection against gradient inversion attacks.

Contribution

The paper proposes a novel MJP position embedding method that shuffles patches and occludes their position info, balancing accuracy, robustness, and privacy in Vision Transformers.

Findings

01

MJP encodes spatial relationships but reduces privacy leakage.

02

Shuffling patches with MJP improves robustness and privacy.

03

MJP enhances accuracy on large-scale datasets under certain conditions.

Abstract

Position Embeddings (PEs), an arguably indispensable component in Vision Transformers (ViTs), have been shown to improve the performance of ViTs on many vision tasks. However, PEs have a potentially high risk of privacy leakage since the spatial information of the input patches is exposed. This caveat naturally raises a series of interesting questions about the impact of PEs on the accuracy, privacy, prediction consistency, etc. To tackle these issues, we propose a Masked Jigsaw Puzzle (MJP) position embedding method. In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded. Meanwhile, for the non-occluded patches, the PEs remain the original ones but their spatial relation is strengthened via our dense absolute localization regressor. The experimental results reveal that 1) PEs explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhlleo/mjp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Retinal Imaging and Analysis

MethodsJigsaw