IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning

Shuyu Chen; Guopeng Lin; Haoyu Niu; Lushan Song; Chengxun Hong; Weili Han

arXiv:2506.01072·cs.CR·June 3, 2025

IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning

Shuyu Chen, Guopeng Lin, Haoyu Niu, Lushan Song, Chengxun Hong, Weili Han

PDF

Open Access

TL;DR

IDCloak introduces a practical multi-party framework for secure dataset joining in vertical privacy-preserving machine learning, enhancing security and efficiency without relying on a non-colluding server.

Contribution

It presents the first secure multi-party dataset join framework for vPPML that maintains ID privacy without a non-colluding auxiliary server, combining optimized protocols for better security and performance.

Findings

01

Outperforms state-of-the-art two-party join frameworks in efficiency.

02

Provides stronger security guarantees under dishonest majority.

03

Significantly improves communication and computation efficiency in secure shuffle protocol.

Abstract

Vertical privacy-preserving machine learning (vPPML) enables multiple parties to train models on their vertically distributed datasets while keeping datasets private. In vPPML, it is critical to perform the secure dataset join, which aligns features corresponding to intersection IDs across datasets and forms a secret-shared and joint training dataset. However, existing methods for this step could be impractical due to: (1) they are insecure when they expose intersection IDs; or (2) they rely on a strong trust assumption requiring a non-colluding auxiliary server; or (3) they are limited to the two-party setting. This paper proposes IDCloak, the first practical secure multi-party dataset join framework for vPPML that keeps IDs private without a non-colluding auxiliary server. IDCloak consists of two protocols: (1) a circuit-based multi-party private set intersection protocol (cmPSI),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Brain Tumor Detection and Classification