Feature Selection and Junta Testing are Statistically Equivalent
Lorenzo Beretta, Nathaniel Harms, Caleb Koch

TL;DR
This paper proves that junta testing and feature selection are statistically equivalent problems, providing a sample complexity bound that is optimal for both tasks, and demonstrating the effectiveness of a brute-force approach.
Contribution
It establishes the statistical equivalence of junta testing and feature selection, and derives the optimal sample complexity for both tasks.
Findings
Brute-force algorithm is sample-optimal for both problems.
Optimal sample size is rac{1}{\u03b5}(\u221a{2^k inom{n}{k}} + \u221a{inom{n}{k}}).
Junta testing and feature selection are statistically equivalent.
Abstract
For a function , the junta testing problem asks whether depends on only variables. If depends on only variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force'' algorithm, which checks for any set of variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is \[ \Theta\left(\frac 1 \varepsilon \left( \sqrt{2^k \log {n \choose k}} + \log {n \choose k}\right)\right). \]
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Statistical Methods and Inference
MethodsFeature Selection · Sparse Evolutionary Training
