Single-round Self-supervised Distributed Learning using Vision Transformer
Sangjoon Park, Ik-Jae Lee, Jun Won Kim, Jong Chul Ye

TL;DR
This paper introduces a self-supervised, communication-efficient distributed learning method for vision transformers that enhances privacy and demonstrates strong performance on medical imaging tasks, serving as a versatile foundation model.
Contribution
It proposes a novel self-supervised masked sampling distillation technique for vision transformers that reduces communication overhead and improves privacy in distributed learning.
Findings
Outperforms existing distributed learning strategies
Achieves superior accuracy over fine-tuning baselines
Demonstrates potential as a task-agnostic foundation model
Abstract
Despite the recent success of deep learning in the field of medicine, the issue of data scarcity is exacerbated by concerns about privacy and data ownership. Distributed learning approaches, including federated learning, have been investigated to address these issues. However, they are hindered by the need for cumbersome communication overheads and weaknesses in privacy protection. To tackle these challenges, we propose a self-supervised masked sampling distillation method for the vision transformer. This method can be implemented without continuous communication and can enhance privacy by utilizing a vision transformer-specific encryption technique. We conducted extensive experiments on two different tasks, which demonstrated the effectiveness of our method. We achieved superior performance compared to the existing distributed learning strategy as well as the fine-tuning only baseline.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Brain Tumor Detection and Classification · Face recognition and analysis
MethodsAttention Is All You Need · Linear Layer · Softmax · Residual Connection · Dense Connections · Multi-Head Attention · Layer Normalization · Vision Transformer
