Bidirectional Contrastive Split Learning for Visual Question Answering
Yuwei Sun, Hideya Ochiai

TL;DR
This paper introduces BiCSL, a privacy-preserving split learning framework for visual question answering that enhances robustness against adversarial attacks in decentralized multi-modal data settings.
Contribution
The paper proposes Bidirectional Contrastive Split Learning (BiCSL), a novel decentralized multi-modal learning method that improves privacy and robustness for VQA tasks.
Findings
BiCSL outperforms centralized methods in robustness against backdoor attacks.
Effective self-supervised learning achieved through contrastive loss.
Demonstrated on five state-of-the-art VQA models with VQA-v2 dataset.
Abstract
Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses. One significant challenge is to devise a robust decentralized learning framework for various client models where centralized data collection is refrained due to confidentiality concerns. This work aims to tackle privacy-preserving VQA by decoupling a multi-modal model into representation modules and a contrastive module and leveraging inter-module gradients sharing and inter-client weight sharing. To this end, we propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients. We employ the contrastive loss that enables a more efficient self-supervised learning of decentralized modules. Comprehensive experiments are conducted on the VQA-v2 dataset based on five SOTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Head and Neck Surgical Oncology
