Scalable Vertical Federated Learning via Data Augmentation and Amortized   Inference

Conor Hassan; Matthew Sutton; Antonietta Mira; Kerrie Mengersen

arXiv:2405.04043·stat.CO·May 8, 2024

Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

PDF

Open Access

TL;DR

This paper presents a scalable Bayesian federated learning framework for vertical data partitioning, using data augmentation and amortized inference to enable privacy-preserving, decentralized Bayesian analysis across multiple clients.

Contribution

It introduces a novel framework combining data augmentation with amortized variational inference for scalable Bayesian VFL, addressing high-dimensional challenges.

Findings

01

Effective Bayesian inference demonstrated on logistic and multilevel regression

02

Scalable inference achieved independent of data size and number of clients

03

Framework enables privacy-preserving, decentralized Bayesian analysis

Abstract

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Stochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training