Image and Model Transformation with Secret Key for Vision Transformer
Hitoshi Kiya, Ryota Iijima, MaungMaung Aprilpyone, and Yuma, Kinoshita

TL;DR
This paper introduces a method to transform vision transformer models using secret keys, enabling encrypted image processing without retraining, thus maintaining performance and enhancing model security.
Contribution
It presents a novel scheme for transforming ViT models with secret keys, allowing encrypted image processing without additional training or network modifications.
Findings
Transformed models maintain performance with encrypted images.
The scheme effectively protects models against unauthorized access.
No need for special training data or network changes.
Abstract
In this paper, we propose a combined use of transformed images and vision transformer (ViT) models transformed with a secret key. We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images on the basis of the ViT architecture, and the performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key. In addition, the proposed scheme does not require any specially prepared data for training models or network modification, so it also allows us to easily update the secret key. In an experiment, the effectiveness of the proposed scheme is evaluated in terms of performance degradation and model protection performance in an image classification task on the CIFAR-10 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Chaos-based Image/Signal Encryption · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Softmax · Residual Connection · Layer Normalization · Dense Connections · Vision Transformer
