StarTrek: Combinatorial Variable Selection with False Discovery Rate   Control

Lu Zhang; Junwei Lu

arXiv:2108.09904·stat.ME·September 18, 2023

StarTrek: Combinatorial Variable Selection with False Discovery Rate Control

Lu Zhang, Junwei Lu

PDF

Open Access

TL;DR

This paper introduces the StarTrek filter, a novel method for selecting hub nodes in high-dimensional networks while controlling the false discovery rate, using Gaussian multiplier bootstrap and new theoretical bounds.

Contribution

The paper develops a new inferential method for identifying network hubs with FDR control, addressing combinatorial and dependence challenges with novel Gaussian comparison bounds.

Findings

01

StarTrek filter effectively controls FDR in simulations.

02

Method successfully identifies key hub nodes in real data.

03

Theoretical bounds ensure accurate FDR control under dependence.

Abstract

Variable selection on the large-scale networks has been extensively studied in the literature. While most of the existing methods are limited to the local functionals especially the graph edges, this paper focuses on selecting the discrete hub structures of the networks. Specifically, we propose an inferential method, called StarTrek filter, to select the hub nodes with degrees larger than a certain thresholding level in the high dimensional graphical models and control the false discovery rate (FDR). Discovering hub nodes in the networks is challenging: there is no straightforward statistic for testing the degree of a node due to the combinatorial structures; complicated dependence in the multiple testing problem is hard to characterize and control. In methodology, the StarTrek filter overcomes this by constructing p-values based on the maximum test statistics via the Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Statistical Methods and Inference · Gene expression and cancer classification