Privacy-Preserving Image Classification Using Vision Transformer

Zheng Qi; AprilPyone MaungMaung; Yuma Kinoshita; Hitoshi Kiya

arXiv:2205.12041·cs.CV·May 25, 2022·1 cites

Privacy-Preserving Image Classification Using Vision Transformer

Zheng Qi, AprilPyone MaungMaung, Yuma Kinoshita, Hitoshi Kiya

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving image classification approach using Vision Transformer that maintains high accuracy while protecting visual information, outperforming existing methods in robustness and accuracy.

Contribution

It presents a novel method combining encrypted images with ViT, enabling privacy-preserving classification without sacrificing accuracy.

Findings

01

Outperforms state-of-the-art methods in accuracy

02

Demonstrates robustness against various attacks

03

Maintains high classification performance with encrypted images

Abstract

In this paper, we propose a privacy-preserving image classification method that is based on the combined use of encrypted images and the vision transformer (ViT). The proposed method allows us not only to apply images without visual information to ViT models for both training and testing but to also maintain a high classification accuracy. ViT utilizes patch embedding and position embedding for image patches, so this architecture is shown to reduce the influence of block-wise image transformation. In an experiment, the proposed method for privacy-preserving image classification is demonstrated to outperform state-of-the-art methods in terms of classification accuracy and robustness against various attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Chaos-based Image/Signal Encryption

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Layer Normalization · Dense Connections · Vision Transformer