Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification

Haiwei Lin; Shoko Imaizumi; and Hitoshi Kiya

arXiv:2507.11943·cs.CR·July 17, 2025

Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification

Haiwei Lin, Shoko Imaizumi, and Hitoshi Kiya

PDF

Open Access

TL;DR

This paper introduces a low-rank adaptation technique for fine-tuning pre-trained vision transformers, enabling privacy-preserving image classification with fewer trainable parameters and minimal accuracy loss.

Contribution

The method uniquely integrates trainable rank decomposition matrices into each ViT layer and updates the patch embedding, improving efficiency and privacy preservation.

Findings

01

Reduces trainable parameters significantly

02

Maintains accuracy comparable to full fine-tuning

03

Effective for privacy-preserving image classification

Abstract

We propose a low-rank adaptation method for training privacy-preserving vision transformer (ViT) models that efficiently freezes pre-trained ViT model weights. In the proposed method, trainable rank decomposition matrices are injected into each layer of the ViT architecture, and moreover, the patch embedding layer is not frozen, unlike in the case of the conventional low-rank adaptation methods. The proposed method allows us not only to reduce the number of trainable parameters but to also maintain almost the same accuracy as that of full-time tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing