Feature Selection from Differentially Private Correlations

Ryan Swope; Amol Khanna; Philip Doldo; Saptarshi Roy; Edward Raff

arXiv:2408.10862·cs.LG·August 26, 2024

Feature Selection from Differentially Private Correlations

Ryan Swope, Amol Khanna, Philip Doldo, Saptarshi Roy, Edward Raff

PDF

Open Access

TL;DR

This paper introduces a new differentially private feature selection method based on correlation order statistics, which outperforms the traditional two-stage approach especially in sparse, high-dimensional datasets.

Contribution

It proposes a novel privacy-preserving feature selection technique using correlation order statistics, addressing stability issues of existing methods in sparse data scenarios.

Findings

01

Our method significantly outperforms the baseline in multiple datasets.

02

It maintains privacy while effectively selecting important features.

03

The approach is more stable under data sparsity.

Abstract

Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through $L_{1}$ -regularized regression, but this can become inefficient for very high-dimensional datasets. Additionally, high-dimensional regression can leak information about individual datapoints in a dataset. In this paper, we empirically evaluate the established baseline method for feature selection with differential privacy, the two-stage selection technique, and show that it is not stable under sparsity. This makes it perform poorly on real-world datasets, so we consider a different approach to private feature selection. We employ a correlations-based order statistic to choose important features from a dataset and privatize them to ensure that the results do not leak information about individual datapoints. We find that our method significantly outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and Data Classification

MethodsFeature Selection