Multi-Task Distributed Learning using Vision Transformer with Random   Patch Permutation

Sangjoon Park; Jong Chul Ye

arXiv:2204.03500·cs.LG·April 8, 2022

Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation

Sangjoon Park, Jong Chul Ye

PDF

Open Access

TL;DR

This paper introduces a novel multi-task distributed learning method using Vision Transformer with random patch permutation, improving collaboration, privacy, and efficiency in medical imaging applications.

Contribution

It proposes replacing CNN heads with random patch permutation in ViT, enhancing multi-task learning and privacy without increasing communication overhead.

Findings

01

Significant improvement in multi-task collaboration performance

02

Enhanced communication efficiency in distributed learning

03

Better privacy preservation in medical imaging tasks

Abstract

The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FeSTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FeSTA, p-FeSTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cerebrospinal fluid and hydrocephalus · Stochastic Gradient Optimization Techniques

MethodsAttention Is All You Need · Linear Layer · Dropout · Absolute Position Encodings · Label Smoothing · Softmax · Adam · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer