Variable selection via knockoffs for clustered data
Silvia Bacci, Leonardo Grilli, and Carla Rampichini

TL;DR
This paper extends the knockoffs method for variable selection to clustered data by decomposing predictors into level-specific components and performing separate selection at each level, improving false discovery control and power.
Contribution
It introduces a two-step approach for applying knockoffs to clustered data, addressing the challenge of mixed measurement levels and demonstrating improved performance over existing methods.
Findings
Separate analysis at two levels improves variable selection accuracy.
Sequential knockoffs outperform Lasso and derandomized knockoffs in simulations.
Applying methods to combined data matrices fails, highlighting the importance of level-specific analysis.
Abstract
We extend the knockoffs method for selecting predictors to clustered data (cross-sectional or repeated measures). In the setting of clustered data, variable selection is complex because some predictors are measured at the observation level (level 1), whereas others are measured at the cluster level (level 2), so their values are constant within clusters. The solution we propose is to conduct variable selection separately at the two levels. To this end, we suggest a two-step approach: (i) decompose each level 1 predictor into level 2 and level 1 components by replacing it with the cluster mean and the deviation from the cluster mean; (ii) perform variable selection separately at the two levels, where the level 1 data matrix includes the deviations from the cluster means and the level 2 data matrix includes the cluster means of level 1 predictors and the level 2 predictors. To evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
