Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification
Haiwei Lin, Shoko Imaizumi, and Hitoshi Kiya

TL;DR
This paper introduces a low-rank adaptation technique for fine-tuning pre-trained vision transformers, enabling privacy-preserving image classification with fewer trainable parameters and minimal accuracy loss.
Contribution
The method uniquely integrates trainable rank decomposition matrices into each ViT layer and updates the patch embedding, improving efficiency and privacy preservation.
Findings
Reduces trainable parameters significantly
Maintains accuracy comparable to full fine-tuning
Effective for privacy-preserving image classification
Abstract
We propose a low-rank adaptation method for training privacy-preserving vision transformer (ViT) models that efficiently freezes pre-trained ViT model weights. In the proposed method, trainable rank decomposition matrices are injected into each layer of the ViT architecture, and moreover, the patch embedding layer is not frozen, unlike in the case of the conventional low-rank adaptation methods. The proposed method allows us not only to reduce the number of trainable parameters but to also maintain almost the same accuracy as that of full-time tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
