VertiBench: Advancing Feature Distribution Diversity in Vertical   Federated Learning Benchmarks

Zhaomin Wu; Junyi Hou; Bingsheng He

arXiv:2307.02040·cs.LG·March 14, 2024·2 cites

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

Zhaomin Wu, Junyi Hou, Bingsheng He

PDF

Open Access 1 Repo

TL;DR

This paper introduces VertiBench, a new benchmark for vertical federated learning that includes diverse real-world datasets, evaluation metrics, and considers feature importance and correlation to improve algorithm assessment.

Contribution

The paper presents VertiBench, a comprehensive VFL benchmark with new datasets, metrics, and splitting methods addressing existing limitations in feature distribution diversity.

Findings

01

Existing benchmarks lack real-world diversity.

02

Feature importance and correlation significantly impact VFL performance.

03

The new dataset enhances evaluation of image-image VFL algorithms.

Abstract

Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xtra-Computing/VertiBench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning