Feature Selection from Differentially Private Correlations
Ryan Swope, Amol Khanna, Philip Doldo, Saptarshi Roy, Edward Raff

TL;DR
This paper introduces a new differentially private feature selection method based on correlation order statistics, which outperforms the traditional two-stage approach especially in sparse, high-dimensional datasets.
Contribution
It proposes a novel privacy-preserving feature selection technique using correlation order statistics, addressing stability issues of existing methods in sparse data scenarios.
Findings
Our method significantly outperforms the baseline in multiple datasets.
It maintains privacy while effectively selecting important features.
The approach is more stable under data sparsity.
Abstract
Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through -regularized regression, but this can become inefficient for very high-dimensional datasets. Additionally, high-dimensional regression can leak information about individual datapoints in a dataset. In this paper, we empirically evaluate the established baseline method for feature selection with differential privacy, the two-stage selection technique, and show that it is not stable under sparsity. This makes it perform poorly on real-world datasets, so we consider a different approach to private feature selection. We employ a correlations-based order statistic to choose important features from a dataset and privatize them to ensure that the results do not leak information about individual datapoints. We find that our method significantly outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Data Classification
MethodsFeature Selection
