Privacy-Preserving Image Classification Using Vision Transformer
Zheng Qi, AprilPyone MaungMaung, Yuma Kinoshita, Hitoshi Kiya

TL;DR
This paper introduces a privacy-preserving image classification approach using Vision Transformer that maintains high accuracy while protecting visual information, outperforming existing methods in robustness and accuracy.
Contribution
It presents a novel method combining encrypted images with ViT, enabling privacy-preserving classification without sacrificing accuracy.
Findings
Outperforms state-of-the-art methods in accuracy
Demonstrates robustness against various attacks
Maintains high classification performance with encrypted images
Abstract
In this paper, we propose a privacy-preserving image classification method that is based on the combined use of encrypted images and the vision transformer (ViT). The proposed method allows us not only to apply images without visual information to ViT models for both training and testing but to also maintain a high classification accuracy. ViT utilizes patch embedding and position embedding for image patches, so this architecture is shown to reduce the influence of block-wise image transformation. In an experiment, the proposed method for privacy-preserving image classification is demonstrated to outperform state-of-the-art methods in terms of classification accuracy and robustness against various attacks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Chaos-based Image/Signal Encryption
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Layer Normalization · Dense Connections · Vision Transformer
