Optimal control of false discovery criteria in the two-group model

Ruth Heller; Saharon Rosset

arXiv:1902.00892·stat.ME·December 8, 2020

Optimal control of false discovery criteria in the two-group model

Ruth Heller, Saharon Rosset

PDF

TL;DR

This paper develops optimal multiple testing procedures for controlling FDR and pFDR in the two-group model, even with dependent test statistics, by thresholding the local false discovery rate with data-dependent thresholds.

Contribution

It derives the first optimal policies for FDR and pFDR control in the two-group model allowing dependence, with an efficient algorithm for large-scale problems.

Findings

01

Optimal policies threshold locFDR with data-dependent thresholds.

02

Algorithms efficiently handle thousands of hypotheses.

03

Procedures demonstrated on gene expression data.

Abstract

The highly influential two-group model in testing a large number of statistical hypotheses assumes that the test statistics are drawn independently from a mixture of a high probability null distribution and a low probability alternative. Optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control, is known to be achieved by thresholding the local false discovery rate (locFDR), i.e., the probability of the hypothesis being null given the set of test statistics, with a fixed threshold. We address the challenge of controlling optimally the popular false discovery rate (FDR) or positive FDR (pFDR) rather than mFDR in the general two-group model, which also allows for dependence between the test statistics. These criteria are less conservative than the mFDR criterion, so they make more rejections…

Tables4

Table 1. Table 1: Results for K = 5000 𝐾 5000 K=5000 z 𝑧 z -scores generated independently from the two-group model ( 1 − π ) × N ( 0 , 1 ) + π × N ( θ , 1 ) 1 𝜋 𝑁 0 1 𝜋 𝑁 𝜃 1 (1-\pi)\times N(0,1)+\pi\times N(\theta,1) . For each θ ∈ { − 2.5 , − 2.0 , − 1.5 } 𝜃 2.5 2.0 1.5 \theta\in\{-2.5,-2.0,-1.5\} and π ∈ { 0.1 , 0.3 } 𝜋 0.1 0.3 \pi\in\{0.1,0.3\} , we provide the expected number of true positives (TP= 𝔼 ( R − V ) 𝔼 𝑅 𝑉 \mathbb{E}(R-V) ), FDR, pFDR, mFDR, and probability of no rejection ( Pr ( R = 0 ) Pr 𝑅 0 \textrm{Pr}(R=0) ), for the four procedures compared. Since F D R = p F D R × ( 1 − Pr ( R = 0 ) ) 𝐹 𝐷 𝑅 𝑝 𝐹 𝐷 𝑅 1 Pr 𝑅 0 FDR=pFDR\times(1-\textrm{Pr}(R=0)) column 5 can be determined from columns 6 and 8. When the OMT-FDR policy has Pr ( R > 0 ) = 1 Pr 𝑅 0 1 \textrm{Pr}(R>0)=1 , it coincides with the OMT-pFDR policy and therefore the OMT-pFDR line is omitted. TP is bold in the settings where the power advantage of OMT-FDR and OMT-pFDR over the alternatives is non-negligible.

$π$	$θ$	Procedure	TP	FDR	pFDR	mFDR	$Pr (R = 0)$
0.1	-1.5	OMT-FDR	29.763	0.050	0.841	0.843	0.940
		OMT-pFDR	12.488	0.045	0.051	0.824	0.118
		OMT-mFDR	4.062	0.049	0.050	0.050	0.013
		Oracle BH	6.123	0.050	0.056	0.066	0.113
0.1	-2	OMT-FDR	60.308	0.050	0.065	0.079	0.230
		OMT-pFDR	59.755	0.050	0.050	0.073	0.000
		OMT-mFDR	56.403	0.050	0.050	0.050	0.000
		Oracle BH	57.277	0.050	0.050	0.052	0.000
0.1	-2.5	OMT-FDR	179.468	0.050	0.050	0.051	0.000
		OMT-mFDR	178.992	0.050	0.050	0.050	0.000
		Oracle BH	179.346	0.050	0.050	0.050	0.000
0.3	-1.5	OMT-FDR	167.662	0.050	0.181	0.184	0.723
		OMT-pFDR	155.652	0.050	0.050	0.166	0.000
		OMT-mFDR	117.088	0.050	0.050	0.050	0.000
		Oracle BH	118.419	0.050	0.050	0.051	0.000
0.3	-2	OMT-FDR	500.0330	0.0500	0.0500	0.0504	0.0000
		OMT-mFDR	499.3813	0.0500	0.0500	0.0500	0.0000
		Oracle BH	499.7893	0.0500	0.0500	0.0501	0.0000
0.3	-2.5	OMT-FDR	927.8398	0.0500	0.0500	0.0501	0.0000
		OMT-mFDR	927.7303	0.0500	0.0500	0.0500	0.0000
		Oracle BH	927.8105	0.0500	0.0500	0.0501	0.0000

Table 2. Table 2: Results for K = 5000 𝐾 5000 K=5000 z 𝑧 z -scores generated from the following general two-group model: for each i 𝑖 i , h i ∼ B e r n o u l l i ( 0.3 ) similar-to subscript ℎ 𝑖 𝐵 𝑒 𝑟 𝑛 𝑜 𝑢 𝑙 𝑙 𝑖 0.3 h_{i}\sim Bernoulli(0.3) is sampled independently and the z 𝑧 z -score mean is − 1.5 × h i 1.5 subscript ℎ 𝑖 -1.5\times h_{i} . The covariance matrix Σ h subscript Σ ℎ \Sigma_{h} is a block diagonal matrix with blocks of size five, diagonal entries 1 + 0.01 × h i 1 0.01 subscript ℎ 𝑖 1+0.01\times h_{i} and off-diagonal entries with value ρ b ∈ { 0 , 0.1 , 0.5 } subscript 𝜌 𝑏 0 0.1 0.5 \rho_{b}\in\{0,0.1,0.5\} for block b ∈ { 1 , … , 1000 } 𝑏 1 … 1000 b\in\{1,\ldots,1000\} . We provide the FDR, pFDR, and mFDR, as well as the expected number of true positives (TP= 𝔼 ( R − V ) 𝔼 𝑅 𝑉 \mathbb{E}(R-V) ), for: the OMT procedure with Err control (OMT-Err when Err is, respectively, FDR, pFDR, and mFDR); the procedure based on the marginal local FDR (a sub-optimal test statistic) with Err control (marg-Err); the misspecified procedure that utilizes the OMT policy for Err control under the assumptions that the z 𝑧 z -scores are independent (ind-Err); est-mFDR; Adaptive-BH; and BH. TP is bold in the settings where the power of the OMT procedures that take dependence into account is substantially larger than their marginal counterparts, which base their decisions on the marginal locFDRs.

	$ρ_{b} = 0.1$				$ρ_{b} \in {0.1, 0.5}$				$ρ_{b} = 0.5$
	FDR	pFDR	mFDR	TP	FDR	pFDR	mFDR	TP	FDR	pFDR	mFDR	TP
OMT-FDR	.049	.159	.162	169	.050	.055	.059	263	.050	.050	.051	386
marg-FDR	.050	.176	.179	167	.051	.178	.181	169	.051	.181	.185	169
ind-FDR	.052	.177	.180	173	.056	.179	.183	185	.061	.183	.187	199
OMT-pFDR	.051	.051	.147	166	.050	.050	.058	263	.050	.050	.051	386
marg-pFDR	.050	.050	.163	158	.049	.049	.164	154	.051	.051	.168	159
ind-pFDR	.052	.052	.163	163	.053	.053	.166	168	.059	.059	.171	183
OMT-mFDR	.050	.050	.050	130	.050	.050	.050	261	.050	.050	.050	386
marg-mFDR	.050	.050	.050	121	.050	.050	.050	121	.050	.050	.050	121
ind-mFDR	.050	.050	.050	120	.050	.050	.050	121	.050	.050	.050	121
est-mFDR	.050	.050	.050	120	.050	.050	.050	120	.050	.050	.050	120
adaptive BH	.050	.050	.051	122	.050	.050	.052	122	.050	.050	.052	122
BH	.035	.035	.037	73	.035	.035	.037	72	.035	.035	.037	72

Table 3. Table 3: Results for K = 5000 𝐾 5000 K=5000 z 𝑧 z -scores generated independently from the two-group model ( 1 − π ) × N ( 0 , 1 ) + π × N ( − 2 , 1 ) 1 𝜋 𝑁 0 1 𝜋 𝑁 2 1 (1-\pi)\times N(0,1)+\pi\times N(-2,1) . We provide the FDR, pFDR, mFDR, and expected number of true positives (TP= 𝔼 ( R − V ) 𝔼 𝑅 𝑉 \mathbb{E}(R-V) ), for the estimated OMT procedure with FDR control (est-OMT-FDR), with pFDR control (est-OMT-pFDR), with mFDR control (est-mFDR), and for adaptive BH. The conservative estimation method uses the default prior D i r i c h l e t ( 1 , 0 ) 𝐷 𝑖 𝑟 𝑖 𝑐 ℎ 𝑙 𝑒 𝑡 1 0 Dirichlet(1,0) for ( 1 − π , π ) 1 𝜋 𝜋 (1-\pi,\pi) ; the non-conservative estimation method uses the estimator of Jin and Cai, ( 2007 ) , which was recommended in Sun and Cai, ( 2007 ) with supplementary R code. The standard error of the estimated FDR is at most 0.004. The est-OMT-FDR policy has P r ( R > 0 ) = 1 𝑃 𝑟 𝑅 0 1 Pr(R>0)=1 for every simulated dataset in the last two settings, so it coincides with the est-OMT-pFDR policy and therefore the OMT-pFDR line is omitted.

$π$	estimation method	Procedure	TP	FDR	pFDR	mFDR	$Pr (R = 0)$
0.1	non-conservative	est-OMT-FDR	113.144	0.122	0.141	0.281	0.133
		est-OMT-pFDR	103.826	0.108	0.108	0.253	0.000
		est-mFDR	49.769	0.045	0.045	0.048	0.001
		Adaptive BH	56.875	0.051	0.051	0.053	0.000
0.1	conservative	est-OMT-FDR	68.100	0.060	0.060	0.066	0.008
		est-OMT-pFDR	67.740	0.059	0.059	0.066	0.000
		est-mFDR	47.199	0.042	0.042	0.043	0.000
		Adaptive BH	53.833	0.048	0.048	0.050	0.000
0.3	non-conservative	est-OMT-FDR	499.887	0.049	0.049	0.050	0.000
		est-mFDR	491.689	0.048	0.048	0.048	0.000
		Adaptive BH	495.706	0.049	0.049	0.049	0.000
0.3	conservative	est-OMT-FDR	496.375	0.049	0.049	0.049	0.000
		est-mFDR	387.820	0.036	0.036	0.036	0.000
		Adaptive BH	452.535	0.043	0.043	0.043	0.000

Table 4. Table 4: Results for the K = 15247 𝐾 15247 K=15247 genes for up-regulation (rows 1–2) and down-regulation (rows 3–4). We provide the number of rejections by the novel procedure (column 2) and the competitors, as well as the number of these rejections that are in the “set of confirmed discoveries”.

	est-OMT-FDR	est-mFDR	adaptive BH	BH
# up regulated	2409	2305	2264	2211
# up regulated in “set of confirmed discoveries”	2276	2219	2189	2145
# down regulated	2023	1897	1837	1775
# down regulated in “set of confirmed discoveries”	1815	1731	1699	1671

Equations166

\mbox F D R : E (\frac{V}{max ( R , 1 )}) = E (\frac{V}{R} ∣ R > 0) Pr (R > 0) .

\mbox F D R : E (\frac{V}{max ( R , 1 )}) = E (\frac{V}{R} ∣ R > 0) Pr (R > 0) .

\mbox pF D R : E (\frac{V}{R} ∣ R > 0); \mbox m F D R : \frac{E V}{E R} .

\mbox pF D R : E (\frac{V}{R} ∣ R > 0); \mbox m F D R : \frac{E V}{E R} .

Π (D) = E (R (D) - V (D)) = E (h^{t} D),

Π (D) = E (R (D) - V (D)) = E (h^{t} D),

t_{α} = max {t : \frac{\sum _{k = 1}^{K} E {( 1 - h _{k} ) I ( T _{ma r g} ( z _{k} ) \leq t )}}{\sum _{k = 1}^{K} E { I ( T _{ma r g} ( z _{k} ) \leq t }} \leq α} .

t_{α} = max {t : \frac{\sum _{k = 1}^{K} E {( 1 - h _{k} ) I ( T _{ma r g} ( z _{k} ) \leq t )}}{\sum _{k = 1}^{K} E { I ( T _{ma r g} ( z _{k} ) \leq t }} \leq α} .

\frac{\sum _{k = 1}^{K} E {( 1 - h _{k} ) I ( T _{ma r g} ( z _{k} ) \leq t )}}{\sum _{k = 1}^{K} E { I ( T _{ma r g} ( z _{k} ) \leq t }} = Pr (h = 0 ∣ T_{ma r g} (z) \leq t) .

\frac{\sum _{k = 1}^{K} E {( 1 - h _{k} ) I ( T _{ma r g} ( z _{k} ) \leq t )}}{\sum _{k = 1}^{K} E { I ( T _{ma r g} ( z _{k} ) \leq t }} = Pr (h = 0 ∣ T_{ma r g} (z) \leq t) .

E_{h, Z} {(1 - h) I (T_{ma r g} (Z) \leq t)} = E_{Z} [E {(1 - h) ∣ Z} I (T_{ma r g} (Z) \leq t)] = E_{Z} {T_{ma r g} (Z) I (T_{ma r g} (Z) \leq t)} .

E_{h, Z} {(1 - h) I (T_{ma r g} (Z) \leq t)} = E_{Z} [E {(1 - h) ∣ Z} I (T_{ma r g} (Z) \leq t)] = E_{Z} {T_{ma r g} (Z) I (T_{ma r g} (Z) \leq t)} .

T_{i} (z) \geq T_{j} (z) \Leftrightarrow D_{i}^{E r r} (z) \leq D_{j}^{E r r} (z) .

T_{i} (z) \geq T_{j} (z) \Leftrightarrow D_{i}^{E r r} (z) \leq D_{j}^{E r r} (z) .

D : R^{K} \to {0, 1}^{K} max

D : R^{K} \to {0, 1}^{K} max

E r r (D) \leq α .

E (h^{t} D)

E (h^{t} D)

F D R (D) = \int_{R^{K}} h \sum \frac{( 1 ^{t} - h ^{t} ) D ( z )}{1 ^{t} D ( z )} g (z ∣ h) π^{1^{t} h} (1 - π)^{K - 1^{t} h} d z

F D R (D) = \int_{R^{K}} h \sum \frac{( 1 ^{t} - h ^{t} ) D ( z )}{1 ^{t} D ( z )} g (z ∣ h) π^{1^{t} h} (1 - π)^{K - 1^{t} h} d z

= \int_{R^{K}} i = 1 \sum K \frac{D _{i} ( z )}{1 ^{t} D ( z )} h \sum (1 - h_{i}) g (z ∣ h) π^{1^{t} h} (1 - π)^{K - 1^{t} h} d z

= \int_{R^{K}} i = 1 \sum K \frac{D _{i} ( z )}{1 ^{t} D ( z )} T_{i} (z) P (z) d z \leq α,

pF D R (D) = \frac{F D R ( D )}{\int _{R^{K}} I { 1 ^{t} D ( z ) > 0 } P ( z ) d z} \leq α,

F D R (\tilde{D}) = \int_{R^{K}} P (z) [\tilde{D}_{1} (z) T_{(1)} (z) + k = 2 \sum K \tilde{D}_{i} (z) \frac{1}{k} (T_{(k)} (z) - \overset{ˉ}{T}_{k - 1} (z))] d z,

F D R (\tilde{D}) = \int_{R^{K}} P (z) [\tilde{D}_{1} (z) T_{(1)} (z) + k = 2 \sum K \tilde{D}_{i} (z) \frac{1}{k} (T_{(k)} (z) - \overset{ˉ}{T}_{k - 1} (z))] d z,

F D R (\tilde{D}) - Pr (R > 0) α = F D R (\tilde{D}) - \int_{R^{K}} P (z) \tilde{D}_{1} (z) α d z \leq 0

F D R (\tilde{D}) - Pr (R > 0) α = F D R (\tilde{D}) - \int_{R^{K}} P (z) \tilde{D}_{1} (z) α d z \leq 0

\tilde{D} : R^{K} \to {0, 1}^{K} max

\tilde{D} : R^{K} \to {0, 1}^{K} max

\int_{R^{K}} P (z) k = 1 \sum K \tilde{D}_{k} (z) b_{k} (z) d z \leq c_{E r r},

D : R^{K} \to [0, 1]^{K} max

D : R^{K} \to [0, 1]^{K} max

\int_{R^{K}} P (z) k = 1 \sum K \tilde{D}_{k} (z) b_{k} (z) d z \leq c_{E r r}

a_{k} (z) - μ b_{k} (z) - λ_{k} (z) + λ_{k + 1} (z) = 0, \forall z \in R^{K}, k = 1, \dots, K .

a_{k} (z) - μ b_{k} (z) - λ_{k} (z) + λ_{k + 1} (z) = 0, \forall z \in R^{K}, k = 1, \dots, K .

μ {\int_{R^{K}} (k = 1 \sum K b_{k} (z) \tilde{D}_{k} (z)) P (z) d z - α} = 0,

λ_{K + 1} (z) \tilde{D}_{K} (z) = 0 \forall z \in R^{K}

λ_{j} (z) (\tilde{D}_{j - 1} (z) - \tilde{D}_{j} (z)) = 0, \forall z \in R^{K}, j = 2, \dots, K

λ_{1} (z) (\tilde{D}_{1} (z) - 1) = 0, \forall z \in R^{K},

\forall v \in R^{K}, v \neq = 0, P (k \sum v_{k} T_{k} (Z) = 0) = 0.

\forall v \in R^{K}, v \neq = 0, P (k \sum v_{k} T_{k} (Z) = 0) = 0.

R_{1} (z) = {1 - T_{(1)} (z) - μ T_{(1)} (z) 1 - T_{(1)} (z) - μ (T_{(1)} (z) - α) if E r r = F D R, if E r r = pF D R .

R_{1} (z) = {1 - T_{(1)} (z) - μ T_{(1)} (z) 1 - T_{(1)} (z) - μ (T_{(1)} (z) - α) if E r r = F D R, if E r r = pF D R .

R_{k} (z) = a_{k} (z) - μ b_{k} (z) = 1 - T_{(k)} (z) - \frac{μ}{k} (T_{(k)} (z) - \overset{ˉ}{T}_{k - 1} (z)) for k = 2, \dots, K .

\tilde{D}_{1}^{μ} (z)

\tilde{D}_{1}^{μ} (z)

\tilde{D}_{i}^{μ} (z)

\int_{R^{K}} P (z) k = 1 \sum K \tilde{D}_{k} (z) b_{k} (z) d z = c_{E r r} .

\int_{R^{K}} P (z) k = 1 \sum K \tilde{D}_{k} (z) b_{k} (z) d z = c_{E r r} .

m_{K} (z) = max (0, R_{K} (z)), m_{k} (z) = max (0, m_{k + 1} + R_{k} (z)), k = K - 1, \dots, 1,

m_{K} (z) = max (0, R_{K} (z)), m_{k} (z) = max (0, m_{k + 1} + R_{k} (z)), k = K - 1, \dots, 1,

\tilde{D}_{1}^{μ} = I {m_{1} > 0}, \tilde{D}_{k}^{μ} = I {(\tilde{D}_{k - 1}^{μ} = 1) \cap m_{k} > 0}, k = 2, ..., K .

\tilde{D}_{1}^{μ} = I {m_{1} > 0}, \tilde{D}_{k}^{μ} = I {(\tilde{D}_{k - 1}^{μ} = 1) \cap m_{k} > 0}, k = 2, ..., K .

g (z ∣ h_{i} = 0) = h \in {0, 1}^{K} : h_{i} = 0 \sum π^{1^{t} h} (1 - π)^{K - 1^{t} h - 1} g (z ∣ h),

g (z ∣ h_{i} = 0) = h \in {0, 1}^{K} : h_{i} = 0 \sum π^{1^{t} h} (1 - π)^{K - 1^{t} h - 1} g (z ∣ h),

g (z ∣ h_{i} = 0) = h_{1} \in {0, 1}^{s_{1}} : h_{i} = 0 \sum π^{1_{1}^{t} h_{1}} (1 - π)^{s_{1} - 1_{1}^{t} h_{1} - 1} g (z_{1} ∣ h_{1}) \cdot l = 2 \prod L h_{l} \in {0, 1}^{s_{l}} \sum π^{1_{l}^{t} h_{l}} (1 - π)^{s_{l} - 1_{l}^{t} h_{l}} g (z_{l} ∣ h_{l}) .

g (z ∣ h_{i} = 0) = h_{1} \in {0, 1}^{s_{1}} : h_{i} = 0 \sum π^{1_{1}^{t} h_{1}} (1 - π)^{s_{1} - 1_{1}^{t} h_{1} - 1} g (z_{1} ∣ h_{1}) \cdot l = 2 \prod L h_{l} \in {0, 1}^{s_{l}} \sum π^{1_{l}^{t} h_{l}} (1 - π)^{s_{l} - 1_{l}^{t} h_{l}} g (z_{l} ∣ h_{l}) .

T_{i} (z) = \frac{\sum _{h_{1} \in {0, 1}^{s_{1}} : h_{i} = 0} π ^{1_{1}^{t} h_{1}} ( 1 - π ) ^{s_{1} - 1_{1}^{t} h_{1}} g ( z _{1} ∣ h _{1} )}{\sum _{h_{1} \in {0, 1}^{s_{1}}} π ^{1_{1}^{t} h_{1}} ( 1 - π ) ^{s_{1} - 1_{1}^{t} h_{1}} g ( z _{1} ∣ h _{1} )} .

T_{i} (z) = \frac{\sum _{h_{1} \in {0, 1}^{s_{1}} : h_{i} = 0} π ^{1_{1}^{t} h_{1}} ( 1 - π ) ^{s_{1} - 1_{1}^{t} h_{1}} g ( z _{1} ∣ h _{1} )}{\sum _{h_{1} \in {0, 1}^{s_{1}}} π ^{1_{1}^{t} h_{1}} ( 1 - π ) ^{s_{1} - 1_{1}^{t} h_{1}} g ( z _{1} ∣ h _{1} )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

**Optimal control of false discovery criteria in the two-group model **

Ruth Heller, Department of Statistics and Operations Research, Tel-Aviv university, Tel-Aviv 6997801, Israel, E-mail: [email protected]

Saharon Rosset, Department of Statistics and Operations Research, Tel-Aviv university, Tel-Aviv 6997801, Israel, E-mail: [email protected]

Abstract. The highly influential two-group model in testing a large number of statistical hypotheses assumes that the test statistics are drawn independently from a mixture of a high probability null distribution and a low probability alternative. Optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control, is known to be achieved by thresholding the local false discovery rate (locFDR), i.e., the probability of the hypothesis being null given the set of test statistics, with a fixed threshold. We address the challenge of controlling optimally the popular false discovery rate (FDR) or positive FDR (pFDR) rather than mFDR in the general two-group model, which also allows for dependence between the test statistics. These criteria are less conservative than the mFDR criterion, so they make more rejections in expectation. We derive their optimal multiple testing (OMT) policies, which turn out to be thresholding the locFDR with a threshold that is a function of the entire set of statistics. We develop an efficient algorithm for finding these policies, and use it for problems with thousands of hypotheses. We illustrate these procedures on gene expression studies.

Keywords: Multiple testing; False discovery rate; Positive FDR; Infinite linear programming; Large scale inference.

1 Introduction

In large scale inference problems, hundreds or thousands of hypotheses are tested in order to discover the set of non-null hypotheses. Such problems are ubiquitous in modern applications in medicine, genetics, particle physics, ecology, and psychology. Multiple testing procedures applied to these large scale problems should control for false discoveries, but they should not be over-conservative, since this limits the ability of scientists to make true discoveries. Thus it is natural to seek multiple testing procedures that control for false discoveries, while assuring as many discoveries as possible.

In order to guarantee that not too many false positives are among the discoveries, Benjamini and Hochberg, (1995) introduced the false discovery rate (FDR). This error measure gained tremendous popularity in large scale testing, as it was less stringent than traditional measures like the familywise error rate. Given a rejection policy, denote the (random) number of rejected null hypotheses by $R$ , and the number of falsely rejected hypotheses (true nulls) by $V$ . The FDR is

[TABLE]

The “two-group model”, first introduced by Efron et al., (2001), has been widely used in large scale inference problems (Efron et al.,, 2001; Genovese and Wasserman,, 2002; Storey,, 2003; Sun and Cai,, 2007; Efron,, 2008; Cai and Sun,, 2017). The model assumes that the observed test statistics are generated independently from the mixture model $(1-\pi)g(z\mid h=0)+\pi g(z\mid h=1)$ , where $h$ follows the Bernoulli( $\pi$ ) distribution, and indicates whether the null is true (h=0) or the alternative holds (h=1). Accordingly $g(z\mid h=0)$ and $g(z\mid h=1)$ represent the distribution of the test statistic under the null and alternative, respectively.

In this paper, we consider a more general setting, in which the test statistics can be dependent and differ in their marginal distributions. The more general setting has been used in Xie et al., (2011) for short range dependence. We denote this setting, for which the two-group model is a special case, as the “general two-group model”. As in the two-group model, the hypotheses states vector, $\vec{h}=(h_{1},\ldots,h_{K})$ , has entries sampled independently from the Bernoulli( $\pi$ ) distribution.But in our more general setting, the observed test statistics, $\vec{Z}=(Z_{1},\ldots,Z_{k})$ , are sampled from the joint distribution given $\vec{h}$ , so $\vec{Z}\mid\vec{h}\sim g(\vec{z}\mid\vec{h}).$

Two measures that are similar to the FDR became popular within the framework of the two-group model. The pFDR was introduced in Storey, (2003). The marginal FDR (mFDR) was introduced in Genovese and Wasserman, (2002); Sun and Cai, (2007). Their formulas are:

[TABLE]

For the two-group model, if the rejection policy is a fixed subset of the real line, then the pFDR and mFDR have been shown to be equivalent (Storey,, 2003). Moreover, as $K\rightarrow\infty$ , all three measures are equivalent (Benjamini,, 2008). Cai and Sun, (2017) claim that there is essentially no difference between the three measures in large-scale testing problems. They say the use of mFDR is mainly for technical considerations, since the ratio of two expectations is easier to handle. In this paper we show that for large values of $K$ there can still be important differences when aiming at FDR control , pFDR control, or mFDR control.

The test statistic that plays a central role for inference on which hypotheses are false is the locFDR, defined for the $i$ th hypothesis as $T_{i}(\vec{z})=\textrm{Pr}(h_{i}=0\mid\vec{z})=\frac{(1-\pi)g(\vec{z}\mid h_{i}=0)}{(1-\pi)g(\vec{z}\mid h_{i}=0)+\pi g(\vec{z}\mid h_{i}=1)}$ , where $g(\vec{z}\mid h_{i})$ is the joint density of $\vec{z}$ given hypothesis state $h_{i}$ only, rather than the entire vector $\vec{h}$ . The locFDR was originally introduced by Efron et al., (2001) for standard (i.i.d) two-group model, where it simplifies since $\textrm{Pr}(h_{i}=0\mid\vec{z})=\textrm{Pr}(h_{i}=0\mid z_{i})$ . We denote it by $T_{marg}(z)=Pr(h=0\mid z)$ and call it the marginal locFDR. Xie et al., (2011) showed that for the general two-group model, the policy which minimizes the marginal false non-discovery rate (mFNR), i.e., the expected number of non-null non-rejections divided by the expected number of non-rejections, with mFDR control, is to threshold the locFDR statistics with a fixed threshold under the general dependence setting, following the work of Sun and Cai, (2007) that showed this for the two-group model.

In this paper, we consider OMT with FDR or pFDR control in addition to mFDR control for the general two-group model. As Cai and Sun, (2017) have noted, since mFDR is the ratio of two expectations, it is easier to handle when seeking an optimal policy. However, $V/R$ is arguably the more fundamental quantity the investigator would like control over for finite $K$ . Therefore a rejection policy that guarantees control over $V/R$ in expectation, while maximizing the expected number of true rejections, can be very useful. We can write the problem of finding the OMT policy as an optimization problem. Briefly, for a family of $K$ hypotheses, let $\vec{D}:\mathbb{R}^{K}\rightarrow\{0,1\}^{K}$ be the decision function when the vector of observed statistics is $\vec{z}$ , so the $i$ th coordinate $D_{i}(\vec{z})$ receives the value of one if the $i$ th null hypothesis is rejected, and zero otherwise. Let $\vec{1}$ be the vector of ones. Then the number of rejected and falsely rejected hypotheses, respectively, are $R(\vec{D}(\vec{z}))=\vec{1}^{t}\vec{D}(\vec{z})=\sum_{k=1}^{K}D_{k}(\vec{z})$ and $V(\vec{D}(\vec{z}))=(\vec{1}^{t}-\vec{h}^{t})\vec{D}(\vec{z})=\sum_{k=1}^{K}(1-h_{k})D_{k}(\vec{z}).$ We denote by $Err(\vec{D})\in\{FDR(\vec{D}),pFDR(\vec{D}),mFDR(\vec{D})\}$ the error rate for policy $\vec{D}$ . We seek to maximize the expected number of true discoveries,

[TABLE]

subject to $Err(\vec{D})\leq\alpha$ . The solutions to these problems we present in this paper can be extended to optimize other notions of power, e..g., to minimize the expectation of the loss functions $L_{\lambda}(\vec{h},\vec{D})=\lambda(\vec{1}^{t}-\vec{h}^{t})\vec{D}+\vec{h}^{t}(\vec{1}-\vec{D})$ considered in Sun and Cai, (2007) or the mFNR. Moreover, the mathematical and algorithmic developments can be easily adapted for developing the OMT policy for other error measures considered in the literature, such as $\mathbb{E}(V)$ (Storey,, 2007) and false discovery exceedance (FDX, where for $\gamma\in(0,1),FDX_{\gamma}=Pr(FDP>\gamma)$ , Lehmann and Romano, 2005). The chosen definition should capture the true “scientific” goal for inference and the type of discoveries we wish to make.

OMT problems in the general two-group model can be viewed as an infinite-dimensional optimization problem, seeking to maximize one integral (the power), subject to an integral constraint expressing the measure we want to control — FDR, mFDR, pFDR, or any other measure. In this paper we adopt this view and demonstrate that for the two-group model we can solve the resulting optimization problem and practically compute the optimal rejection policy for dimension $K$ in the thousands. Our main contributions are as follows.

We show (Theorem 2.1) that the OMT policy for FDR or pFDR control turns out to be thresholding the locFDR with a threshold that is a function of the entire set of statistics. This is true for any dependence structure, and is in contrast to the previously shown OMT policy for mFDR control (Sun and Cai,, 2007; Xie et al.,, 2011), where the threshold is fixed. 2. 2.

We provide efficient algorithms for finding the OMT policies (§ 3 and § 4), which are based on our formulation of the problem as an infinite integer optimization problem with a single constraint. For dependent test statistics, we address the additional computational challenge of computing the locFDR values. The infinite-dimensional formulation to finding OMT policies under frequentist strong control of measures like family-wise error rate (FWER) and FDR was applied recently in Rosset et al., (2018). In that setting, it was possible to solve only relatively small problems, and only under exchangeability assumptions, due to the complex constraint structure of strong control. In contrast, the structure of the two-group model, with a single constraint, and the computational shortcuts we introduce below, allow us to find OMT policies for practically any $K$ . 3. 3.

Via numerical investigations (§ 5), we show that: there is a huge potential power gain from incorporating known dependence into the OMT procedures; the power increase from controlling FDR and pFDR over mFDR can be non-negligible even for thousands of hypotheses; when the signal is weak, the OMT policy with pFDR control has a significantly lower probability of zero rejections than the OMT policy with FDR control, and the pFDR is (arguably) preferred over FDR to control optimally for the two-group model. We also demonstrate the potential usefulness of the methods to gene expression studies (§ 6).

2 Properties of OMT policies

For independent test statistics from the two-group model, Sun and Cai, (2007) showed that the OMT policy for mFDR control is to reject the hypotheses with $T_{marg}(z)\leq t_{\alpha}$ , where

[TABLE]

Since $\sum_{k=1}^{K}\mathbb{E}\{(1-h_{k})\mathbb{I}(T_{marg}(z_{k})\leq t)\}=K\times\textrm{Pr}(h=0,T_{marg}(z)\leq t)$ and $\sum_{k=1}^{K}\mathbb{E}\{\mathbb{I}(T_{marg}(z_{k})\leq t)\}=K\times\textrm{Pr}(T_{marg}(z)\leq t)$ , it follows that

[TABLE]

Storey, (2003) used this observation to show that when the rejection policy is a fixed region of the real line, $pFDR=mFDR$ . Therefore, the optimal rule has a nice Bayesian interpretation: by reporting a hypothesis as non-null if $T_{marg}(z)\leq t_{\alpha}$ , then the mFDR is the chance that a false discovery was made, since $mFDR=pFDR=\textrm{Pr}(h=0\mid T(z)\leq t_{\alpha}).$ Since $FDR=pFDR\times Pr(R>0)\leq pFDR$ , and since the OMT policy with mFDR control is a fixed region of the real line, it follows that the OMT policy with mFDR control at level $\alpha$ also controls the FDR and the pFDR at level $\alpha$ . Therefore, the OMT policy with FDR or pFDR control will necessarily be at least as powerful as the OMT policy with mFDR control. Interestingly, for any number of hypotheses $K$ from the two-group model, the OMT policies with pFDR or FDR control necessarily differ from the OMT policy with mFDR control. This difference exists even if the probability of zero rejections is one, and will become clear from the algorithm for constructing the OMT policies with pFDR and FDR control in § 4. We formalize these interesting properties and few others in the following proposition.

Proposition 2.1.

For $K$ test statistics independently drawn from the two-group model, if the null and non-null distributions have positive densities with respect to the Lebesgue measure on their region of support, then:

$\Pi_{OMT-FDR}\geq\Pi_{OMT-pFDR}\geq\Pi_{OMT-mFDR}$ , where $\Pi_{OMT-Err}$ is the expected number of true discoveries, for the OMT policy with level $\alpha$ control of $Err\in\{FDR,pFDR,mFDR\}$ . 2. 2.

The OMT policy with mFDR control differs from the OMT policy with pFDR control. 3. 3.

$mFDR>pFDR$ * for the OMT policy with pFDR control.* 4. 4.

If the OMT policy with FDR control has probability zero of no rejections, then it coincides with the OMT policy with pFDR control.

See Appendix A for the proof.

Interestingly, the mFDR controlling procedure described above controls the mFDR at the nominal level even if the test statistics are dependent or differ in their marginal distributions, as expressed in the following simple result.

Proposition 2.2.

For $K$ test statistics drawn from the general two-group model, the procedure that rejects the hypotheses with $T_{marg}(z_{i})\leq t_{\alpha}$ , $i=1,\ldots,K$ , satisfies $mFDR\leq\alpha$ .

Proof.

For a single coordinate in the vector of test statistics from the general two-group model,

[TABLE]

So all the expectations in (2.1) depend only on the marginal distributions of the test statistics, and therefore $t_{\alpha}$ depends only on the marginal distributions of $Z_{1},\ldots,Z_{k}$ . The $mFDR$ of the suggested procedure for test statistics drawn from the general two-group model is $\frac{\sum_{k=1}^{K}\mathbb{E}\{T_{marg}(Z_{k})\mathbb{I}(T_{marg}(Z)\leq t)\}}{\sum_{k=1}^{K}\mathbb{E}\{\mathbb{I}(T_{marg}(Z_{k})\leq t_{\alpha}\}}$ , so it follows directly from the definition of $t_{\alpha}$ that this mFDR is bounded above by $\alpha$ . ∎

If the test statistics from the general two-group model are dependent, thresholding the marginal locFDRs is not optimal for mFDR control. However, Xie et al., (2011) showed that the OMT policy for mFDR control is of the form $T_{i}(\vec{z})\leq t$ for $T_{i}(\vec{z})=Pr(h_{i}=0|\vec{z})$ the true locFDR. We provide an alternative proof in § C. Our first key theoretical result is that the optimal FDR and pFDR controlling procedures also reject the hypotheses by thresholding the locFDRs.

Theorem 2.1.

Let $\vec{z}$ be a vector of $K$ test statistics coming from the general two-group model. Then for $Err\in\{FDR,pFDR\}$ , the OMT decision policy which satisfies $Err(\vec{D}^{Err})\leq\alpha$ and $\mathbb{E}(\vec{h}^{t}\vec{D}^{Err})\geq\mathbb{E}(\vec{h}^{t}\vec{D})\quad\forall\vec{D}\ \textrm{s.t.}\ Err(\vec{D})\leq\alpha$ , is almost surely weakly monotone in the locFDR values:

[TABLE]

The proof of this theorem is also given in Appendix A.

In the next section we shall show that the threshold for rejection with optimal FDR or pFDR control depends on all the realized statistics. This is in contrast to the threshold for optimal mFDR control, which is fixed. More specifically, we shall show that the OMT policies with FDR or pFDR control are step-down procedures , in contrast with the single step procedure for controlling the mFDR optimally (see, e.g., Lehmann and Romano, 2005, for the distinction between step-down, step-up, and single step procedures).

3 Optimal procedures for FDR or pFDR control in the general two-group model

Given the selected power measure, the expected number of true positive findings, and false discovery measure to control $Err\in\{FDR,pFDR\}$ , we can write the OMT problem as an infinite dimensional integer program,

[TABLE]

Let $\mathbb{P}(\vec{z})=\sum_{\vec{h}}g(\vec{z}\mid\vec{h})\pi^{\vec{1}^{t}\vec{h}}(1-\pi)^{K-\vec{1}^{t}\vec{h}}$ denote the joint distribution of the test statistics.

The objective is linear in $\vec{D}$ :

[TABLE]

where the last inequality follows since $\sum_{\vec{h}}h_{i}g(\vec{z}\mid\vec{h})\pi^{\vec{1}^{t}\vec{h}}(1-\pi)^{K-\vec{1}^{t}\vec{h}}=\textrm{Pr}(h_{i}=1\mid\vec{z})\mathbb{P}(\vec{z})$ and $\textrm{Pr}(h_{i}=1\mid\vec{z})=1-T_{i}(\vec{z})$ from the locFDR definition.

The constraint can also be expressed in terms of the locFDR values and $\mathbb{P}(\vec{z})$ :

[TABLE]

To simplify the notation, we employ in our FDR calculations the convention $0/0=0$ .

Denote by $\vec{D}^{*}$ an optimal solution of this problem. As written, this is an infinite integer program, with an objective that is linear in $\vec{D}$ but a constraint which is a non-linear function of $\vec{D}$ . In this section, we prove that:

The optimal solution has a structure which allows us to write the constraint as a linear functional of $\vec{D}$ (using Thm. 2.1 above). 2. 2.

Once the problem is written in this linear fashion, the infinite linear program relaxation of the infinite integer problem is guaranteed to have a solution that is integer almost everywhere (Lemma 3.1). 3. 3.

This infinite linear program is guaranteed to have zero duality gap, and hence its solution can be found by solving the Euler-Lagrange conditions, and a solution to these can be found via one-dimensional search (Lemma 3.2).

Taken together, these results establish a practical methodology to solve the general two-group FDR or pFDR control problem. In the next section, we discuss the algorithmic and computational aspects, establishing that this problem can be practically solved for high dimensional settings for the i.i.d two-group model, and in some important cases also for general two-group settings with dependence, yielding the optimal FDR or pFDR controlling policy.

We note that Lemmas 3.1 and 3.2 are similar in nature, and employ similar techniques, to results in our previous work on multiple testing under strong control (Rosset et al.,, 2018), although some of the important details differ. A major difference in details is the fact our decision rule is not necessarily symmetric in $\vec{z}$ when the data is dependent. Another important distinction is that the infinite linear program in this work has only a single error constraint and can be solved practically for large $K$ , whereas in Rosset et al., (2018) it has $K$ error constraints and can be solved only for a very low dimension $K$ .

Theorem 2.1 demonstrates that for every $\vec{z}$ the optimal policy rejects the set of hypotheses with the smallest locFDR, up to a threshold: $D^{*}_{i}(\vec{z})=1\Leftrightarrow T_{i}(\vec{z})\leq t(\vec{z}).$

With this characterization of the optimal solution, we can rewrite the constraint in Problem (3.1,3.2) so it is linear in $\vec{D}$ . To simplify notation, we replace $\vec{D}$ with a version $\tilde{D}$ which operates on the sorted locFDR in increasing order. Explicitly, given $\vec{z},$ let $i_{1},\ldots,i_{K}$ be the sorting permutation, so $T_{i_{1}}(\vec{z})\leq T_{i_{2}}(\vec{z})\leq\ldots\leq T_{i_{K}}(\vec{z}),$ then we define $\tilde{D}_{k}(\vec{z})=D_{i_{k}}(\vec{z}).$ Given the characterization of $D^{*}$ above, then for every $\vec{z}$ we can find $k^{*}(\vec{z})$ such that $\tilde{D}^{*}_{k}(\vec{z})=1\Leftrightarrow k\leq k^{*}(\vec{z}).$

We can therefore write:

[TABLE]

where $\bar{T}_{k-1}(\vec{z})=\frac{\sum_{l=1}^{k-1}T_{(l)}(\vec{z})}{k-1}$ , and $T_{(k)}(\vec{z})$ denotes order statistic, i.e. the $k$ th smallest locFDR value. See Appendix B for details of the derivation of the formulation (3.5). Using this representation, the pFDR constraint in (3.4) also has a linear representation:

[TABLE]

To emphasize the linearity of the objective and constraints, and simplify the followup, we rewrite our formulation in a generic form:

[TABLE]

where $a_{k},b_{k},k=1,\ldots,K$ are functions of the locFDR order statistics, and $c_{Err}$ is a fixed constant. Specifically, for $Err(\tilde{D})=FDR(\tilde{D})$ : $a_{k}(\vec{z})=1-T_{(k)}(\vec{z}),k=1,\ldots,K$ ; $b_{k}(\vec{z})=\left(T_{(k)}(\vec{z})-\bar{T}_{k-1}(\vec{z})\right)/k,k=2,\ldots,K$ ; $b_{1}(\vec{z})=T_{(1)}(\vec{z})$ ; $c_{Err}=c_{FDR}=\alpha$ . For $Err(\tilde{D})=pFDR(\tilde{D})$ , the only differences are that $b_{1}(\vec{z})=T_{(1)}(\vec{z})-\alpha$ and $c_{Err}=c_{pFDR}=0$ .

We now consider the relaxed linear program without the integer requirement on $\tilde{D}$ , by writing the same problem, except optimizing over $\tilde{D}(\vec{z})\in[0,1]^{K}$ :

[TABLE]

To analyze this problem, we consider its Euler-Lagrange (EL) necessary optimality conditions (Korn and Korn,, 2000). We derive the EL conditions for this problem in Appendix A, and also show there that they can be rephrased as requiring the following to hold almost everywhere for optimality, in addition to the (primal feasibility) constraints of Problem (3.8):

[TABLE]

where $\mu$ and $\lambda_{j}(\vec{z}),\;j=1,\ldots,K+1,\;\vec{z}\in\mathbb{R}^{K}$ are non-negative Lagrange multiplies. In analogy to the Karush-Kuhn-Tucker (KKT) conditions in finite convex optimization, we can term condition (3.9) the stationarity condition, and conditions (3.10–3.13) the complementary slackness conditions.

The following result clarifies that for this problem, we can solve the linear program relaxation instead of the integer program, and get an integer solution:

Lemma 3.1.

For $K$ test statistics drawn from the general two-group model, assume the following non-redundancy condition:

[TABLE]

Then any solution to the EL conditions (3.9)–(3.13) is integer almost everywhere on $\mathbb{R}^{K}$ .

Note that the non-redundancy condition Eq. (3.14) is very mild, as it is satisfied whenever the distribution of $\vec{Z}$ is continuous and $T_{k}(\vec{Z})$ are non-linear functions, which is the case in all standard applications.

Our next result shows that for our problem, the EL conditions are in fact not only necessary, but also sufficient (like the KKT conditions in finite linear programs), and we can thus find the infinite linear program solution by finding any solution that complies with these conditions.

Lemma 3.2.

The infinite linear program (3.8) has zero duality gap, and therefore the conditions (3.9)–(3.13) together with primal feasibility are also sufficient, and a solution complying with these conditions is optimal.

For brevity, we defer explicit derivation of the dual together with the proof to Appendix A.

Putting our results together, we obtain our an explicit characterization of the OMT solution to our problems of interest:

Theorem 3.1.

For $K$ test statistics drawn from the general two-group model, an optimal solution to Problem (3.1,3.2) can be found by solving the EL conditions (3.9)–(3.13) together with primal feasibility of the infinite linear program (3.8).

We next show how this can be used to efficiently solve high-dimensional multiple testing problem with FDR or pFDR control for the two-group model.

4 Algorithm

We first characterize a generic algorithm to solve the OMT problem with FDR or pFDR control. We then show how to efficiently implement this approach for high dimensional instances of the problem.

Given a candidate Lagrange multiplier $\mu\geq 0$ , and an efficient method for calculating locFDR values $T_{k}(\vec{z})$ , for $k=1,\ldots,K$ , define: $R_{k}(\vec{z})=a_{k}(\vec{z})-\mu b_{k}(\vec{z})$ . For $a_{k}(\vec{z})$ and $b_{k}(\vec{z})$ defined for the FDR and pFDR constraints, $R_{k}(\vec{z})$ is as follows:

[TABLE]

Denote by $\tilde{D}^{\mu}(\vec{z})$ a solution which complies with (3.9) and (3.11)–(3.13) for this value of $\mu$ . It is easy to confirm that this dictates that almost surely:

[TABLE]

Now we have to ensure that primal feasibility and complementary slackness for $\mu$ hold, in other words find $\mu^{*}\geq 0$ such that the following holds:

[TABLE]

It is easy to confirm that if we find such a solution, then it is feasible, it complies with conditions (3.9)–(3.13), and it is obviously binary. Thus, finding the optimal solution amounts to searching the one-dimensional space of $\mu$ values for a solution of Eq. (4.3), using the characterization in Eqs. (4.1), (4.2).

When naively implemented, the calculation in Eqs. (4.1),(4.2) requires $O(K^{2})$ operations to calculate all partial sums. However we can rephrase it using a recursive representation to require only $O(K)$ calculations. We first calculate, in decreasing order:

[TABLE]

and then, in increasing order:

[TABLE]

We see from the algorithm that the OMT procedure with FDR control starts by determining whether the hypothesis with the smallest locFDR can be rejected, and proceeds to decide whether to reject the hypothesis with the second smallest locFDR only if the decision at the first step was to reject (i.e., $D_{1}^{\mu}=1$ ). Proceeding similarly, only if the hypothesis with the $l$ th smallest locFDR is rejected, the hypothesis with the $(l+1)$ th smallest locFDR is tested, for $l=1,\ldots,K-1$ . Thus, it is a step-down procedure (Lehmann and Romano,, 2005). In contrast, the OMT procedure with mFDR control is a single step procedure since each hypothesis is rejected if its locFDR is less than a common cut-off value.

Implementing the algorithm allows us to find optimal solutions to two-group FDR problems with many thousands of hypotheses in minutes of CPU, as illustrated in § 5 and § 6.

4.1 Calculating locFDR values under dependence

We first note that under the standard two-group model with i.i.d assumptions, calculating $T_{i}(\vec{z}),\;i=1,\ldots,K$ requires $O(K)$ calculations, and thus does not increase the complexity of the algorithm above.

Under general (known) dependence, the calculation of the locFDR involves calculating the terms $g(\vec{z}\mid h_{i}=0)$ and $g(\vec{z}\mid h_{i}=1)$ (or simply $g(\vec{z})$ ). A naive calculation requires integrating over all $O(2^{K})$ possible allocations of the vector $\vec{h},$ for example:

[TABLE]

where $g(\vec{z}\mid\vec{h})$ is the known joint distribution of the test statistics under the configuration $\vec{h}.$ Even assuming the calculation of $g(\vec{z}\mid\vec{h})$ itself is easy, the summation makes this impossible for large $K.$

We discuss two dependence structures where it is possible to design more efficient algorithms:

•

Block dependence: In this setting, the hypotheses $1,\ldots,K$ are partitioned into $1<L<K$ blocks, with $Z_{i}\perp Z_{j}$ if $i,j$ do not belong to the same block. As we show below, in the setting we can calculate the set of locFDRs in complexity that depends exponentially on the size of the biggest block, and only linearly on $K.$ This block structure is often assumed in various applications (REFs), and is the example we study in detail in § 5.2 below.

•

Equi-correlated setting: This refers to the specific setting where the distributions under both the null and alternative are normal with the same variance, and all hypotheses are dependent with an equal correlation for all pairs. In this setting we can design a highly efficient specialized algorithm that requires only $O(K^{3})$ operations to calculate all locFDRs. We present it in § E.

For the block dependence setting, assume we have a partition into blocks $B_{1},\ldots B_{L}$ such that $\cup_{l=1}^{L}B_{L}=\left\{1,\ldots,K\right\}\;,\;\;B_{l}\cap B_{m}=\emptyset\;,\;l\neq m,$ and denote the size of block $l$ by $s_{l}=|B_{l}|.$ Then it is easy to see that the joint distributions we are interested in factor due to independence, for example, assume WLOG $i\in B_{1},$ then we have

[TABLE]

The same product as in the above display also appears in $g(\vec{z}\mid h_{i}=1)$ and $g(\vec{z}),$ so we can combine them to show that the calculation of the locFDR depends only on its dependence block (still assuming $i\in B_{1}$ ):

[TABLE]

The denominator can clearly be calculated via $2^{s_{1}}$ evaluations of $g(\vec{z}_{1}\mid\vec{h}_{1}),$ and is fixed for all $i\in B_{1}.$ A naive calculation of the numerator for all $i\in B_{1}$ requires $s_{1}\times 2^{s_{1}}$ evaluations. We note that the evaluations can be done only once for each $\vec{h}$ and stored with $O(2^{s_{1}})$ memory.

Overall, assuming the evaluation of $g(\vec{z}_{1}\mid\vec{h}_{1})$ for a block of size $s_{1}$ is of complexity $O(s_{1}^{2})$ (as for a multivariate Gaussian with known covariance structure), the total complexity of calculating all locFDRs in a block design with $L$ blocks, each of size $s=K/L$ is $O(K\cdot s\cdot 2^{s}).$

5 Numerical Examples

We compare the performance of the OMT procedure with FDR control (henceforth, OMT-FDR) and the OMT procedure with positive FDR control (henceforth, OMT-pFDR), against two natural competitors: the OMT procedure with mFDR control (henceforth, OMT-mFDR, Xie et al., 2011), and the oracle BH procedure, which applies the BH procedure assuming the probability of a null hypothesis is known (so the threshold for significance of the $i$ th largest $p$ -value is $\frac{i\alpha}{K(1-\pi)}$ instead of the BH threshold $\frac{i\alpha}{K}$ , Benjamini et al., 2006). In § 5.1 we examine the case that the test statistics are independent; in § 5.2 we examine dependence settings, in which case we also compare the OMT procedures with the misspecified procedures that find the OMT policies assuming the test statistics are independent (termed ind-Err when aimed at Err control, where Err is FDR, pFDR, or mFDR); in § 5.3 we examine the effect of estimating the mixture parameters from the data. In § 5.2 and § 5.3 we also compare with the BH procedure and with the adaptive procedure suggested in Sun and Cai, (2007), which is computationally simpler and more intuitive than OMT-mFDR, and therefore quite popular for large scale inference. This procedure, termed here est-mFDR, first orders the estimated marginal locFDRs, $T_{marg,(1)}\leq\ldots\leq T_{marg,(K)}$ , and then rejects the $k$ hypotheses with smallest estimated marginal locFDRs, where $k=\max\{i:\frac{1}{i}\sum_{j=1}^{i}T_{marg,(j)}\leq\alpha\}$ . All simulations are carried out at the nominal level $\alpha=0.05$ for the chosen criterion (mFDR, FDR or pFDR).

5.1 The independent setting

We generate test statistics from the following mixture model: with probability $1-\pi$ , $Z$ is $N(0,1)$ ; with probability $\pi$ , $Z$ is $N(\theta,1)$ with $\theta<0$ . We fix $K=5000$ hypotheses, and experiment with a range of values for $\pi,\theta$ .

Our results are summarized in Table 1. As expected, $FDR\leq pFDR\leq mFDR$ . The gain in power with the novel procedures (OMT-FDR, OMT-pFDR) is small when the mFDR of the novel procedures is only slightly above the nominal level. However, when the gain is large, the mFDR of the novel procedures can be large. The mFDR of the OMT-FDR and OMT-pFDR procedures is above 0.16 for $\theta=-1.5$ , and the power gain over OMT-mFDR is more than 30%. It is above 0.07 for $\theta=-2,\pi=0.1$ , and the gain in power is at least 4%. It is close to the nominal level in the three other settings and the power gain is negligible. The power gain is due to the tendency of FDR and pFDR controlling policies to make very few or very many rejections with nonnegligible probability when the signal is weak or rare, and this erratic behaviour results in high $\mathbb{E}(V)$ and mFDR. Interestingly, when the power gain is large, the FDR of the OMT-mFDR procedure is not much smaller than the nominal level. So the OMT-mFDR has lower power, but approximately the same FDR level, as OMT-FDR. The Oracle BH procedure has FDR level identical to the nominal level, as expected, and its mFDR is only slightly above the nominal level except in the weakest setting with $\pi=0.1$ , where it is inflated to be 0.066.

The last column in Table 1 demonstrates clearly where OMT-FDR and OMT-pFDR differ. In order to control the FDR, the OMT-FDR procedure either makes no rejections, or makes many rejections, when the signal is weak. As a consequence, the false discovery proportion (FDP) is either zero or much higher than the nominal level. This is perhaps an unattractive behavior of the OMT-FDR procedure. As the signal strengthens, the probability of no rejections decreases for OMT-FDR, and its policy approaches that of OMT-pFDR. Since (arguably) pFDR is a more appropriate error measure to control than FDR for the two-group model, the more attractive OMT-pFDR policy may be preferred over OMT-FDR.

5.2 The dependent setting

We generate the test statistics from the two-group model, with $g(\vec{z}\mid\vec{h})$ a multivariate normal distribution with mean $\mu\times\vec{h}$ and a block diagonal covariance matrix. Within each block we experiment with a range of values for $\rho$ , the symmetric correlation.

Our results are summarized in Table 2. As correlation increases the advantage of incorporating dependence into the rule increases, and the power gain can be vary large. In the two settings where at least in half the blocks the correlation is 0.5, the power increases of OMT-FDR, OMT-pFDR, and OMT-mFDR over ind-FDR, ind-pFDR, and ind-mFDR, respectively, is at least 40%. As expected from Proposition 2.2, ind-mFDR maintains the nominal mFDR level of 5%. The procedures in the last three rows are also robust to deviations from independence. However, the misspecified policies that ignore dependence for FDR and pFDR control (ind-FDR and ind-pFDR) have an inflated error, which is at most 6.1% in our settings. A comparison of FDR, pFDR and mFDR policies reveals that the power gain of FDR and pFDR policies over the respective mFDR policy is large when FDP is variable, which is manifested in the high mFDR levels (15%–19%). As in the set of simulations with independent test statistics, we find that the variation in FDP is greater with FDR control than with pFDR control policies.

5.3 The effect of estimation of the mixture components in the two-group model

In practice, the distributions $g(z|h=1)$ , $g(z)$ and the mixture proportion $\pi$ are typically unknown. The estimation of the marginal density of the $z$ -scores and of $\pi$ can be difficult, and there are many different approaches. We shall limit our investigation to fitting a bivariate mixture of normals using the R package mixfdr available from CRAN (Muralidharan,, 2010). The estimation is done using the EM algorithm with a penalization via a Dirichlet prior on $(1-\pi,\pi)$ . Estimation of the fraction of nulls is most conservative if the Dirichlet prior parameters are (1,0). In addition to this prior, we also examined the results with the Dirichlet prior parameters $(1-\hat{\pi},\hat{\pi})$ , where $\hat{\pi}$ is estimated by the method of Jin and Cai, (2007), recommended in Sun and Cai, (2007).

Our results are summarized in Table 3. As in the known distribution case, est-OMT-FDR has the most power, with est-OMT-pFDR a close second, even though it is no longer a necessary guarantee since the rejection region is computed using the estimated parameters from the data. For example, with $\pi=0.3$ the procedure est-OMT-FDR (which coincides with est-OMT-pFDR) has an FDR (which coincides with pFDR) below the nominal level, and compared with est-mFDR, it rejects few more hypotheses on average if the non-conservative method is used for estimating the fraction of nulls, and many more hypotheses if the conservative method is used. However, the est-OMT-FDR can have an inflated FDR level when the fraction of nulls is fairly small (making the estimation problem more difficult). This problem is present to a lesser degree with est-OMT-pFDR. With $\pi=0.1$ : the procedure est-OMT-FDR has an FDR level of 0.12 if the non-conservative method is used for estimating the fraction of nulls, and 0.06 if the conservative method is used; the procedure est-OMT-pFDR has a pFDR level of 0.11 if the non-conservative method is used for estimating the fraction of nulls, and 0.06 if the conservative method is used.

6 Gene expression data analysis

We illustrate the utility of the novel procedures in the context of an application to gene expression studies. The goal of gene expression studies is to identify the genes that are associated with a trait of interest. For this purpose, the gene expression and the traits of individuals are collected.

In this section, we provide our re-analysis of a meta-analysis of gene expression studies described in Shah et al., (2016), using our novel est-OMT-FDR procedure (which coincides with est-OMT-pFDR) and the competitors est-mFDR, adaptive BH, and BH. In § D we analyzed 20 additional gene-expression studies using the same procedures. Our analyses demonstrate clearly that the novel procedure tends to make the largest number of rejections. Of course, having a larger number of rejections does not guarantee having a larger number of true rejections. We chose the example from Shah et al., (2016) since it contained both a discovery and a validation meta-analysis study, so we could combine these to form a set of “confirmed discoveries”. A comparison of the rejections by each method with the confirmed discoveries suggests that est-OMT-FDR has more true discoveries while still maintaining a low false discovery proportion. All analyses can be reproduced using our code available from https://github.com/ruheller/OMT2GroupModel. Specifics follow.

Shah et al., (2016) carried out a primary meta-analysis of four studies of ulcerative colitis, and they reported the primary meta-analysis $p$ -values for up-regulation, and separately, down-regulation, of genes. Using the BH procedure on the meta-analysis $p$ -values, 2211 and 1775 genes with higher or lower expression, respectively, in ulcerative colitis compared with healthy controls, were detected. Shah et al., (2016) also carried out a follow-up meta-analysis of four additional studies of ulcerative colitis. Finally, they carried out a replication analysis that showed a high concordance of the average fold-change and a significant overlap in genes with increased or decreased expression.

We first transformed the primary meta-analysis $p$ -values (based on four studies) aimed at discovering up-regulation, and separately the down-regulation, to $z$ -scores. In order to avoid unbounded values, $z$ -scores with a one-sided $p$ -value of zero (or very nearly zero) were sampled from $N(-6,1)$ , and $z$ -scores with a one-sided $p$ -value of one (or very nearly one) were sampled from $N(6,1)$ . We note that although up-regulation and down-regulation are opposite in terms of effect sizes, the primary meta-analysis $p$ -values are directional and therefore the discoveries of interest are those that correspond to the small primary meta-analysis $p$ -values for up-regulation, and separately for down-regulation. Similarly, after conversion to $z$ -scores the discoveries of interest are those that correspond to small (negative) $z$ -scores for up-regulation, and separately for down regulation. We term these $z$ -scores the discovery $z$ -scores, since they are based only on the primary meta-analyses (i.e., excluding the follow-up meta-analyses).

We assume the discovery $z$ scores are independently generated from a mixture of five normal densities, including one standard normal density (corresponding to genes with no up or down regulation). We estimated the mixture components using the R package mixfdr by Muralidharan, (2010). The estimation carried out uses the EM algorithm with a penalization via a Dirichlet prior with a parameter value of one for standard normal component and of zero for the remaining four mixture components. We denote by $(\hat{\mu}_{i},\hat{\sigma}_{i})$ the estimated mean and standard deviation for normal component, and by $\hat{\pi}_{\mu_{i}}$ the estimated probability the $z$ -score is sampled from $N(\hat{\mu}_{i},\ \sigma_{i})$ , for $i\in\{1,2,3,4\}$ . For each of the two analyses we carried out (for discovering up-regulation, and separately, down regulation), the estimated mixture density had two negative means, $(\hat{\mu}_{3},\hat{\mu}_{4}$ ), and two positive means, ( $\hat{\mu}_{1},\hat{\mu}_{2}$ ).

In this example, the null hypothesis is a compound null, that the $z$ -score has expectation at least zero. Since $\sum_{i=1}^{4}\hat{\pi}_{\mu_{i}}+\hat{\pi}_{0}=1$ , the probability of the discovery $z$ -scores coming from a nonnull hypothesis is estimated to be $\hat{\pi}=\hat{\pi}_{\mu_{3}}+\hat{\pi}_{\mu_{4}}$ . The estimated null and alternative distributions are therefore $\hat{g}(z\mid h=0)=\frac{\hat{\pi}_{0}}{1-\hat{\pi}}\phi(z)+\frac{\hat{\pi}_{\mu_{1}}}{1-\hat{\pi}}\frac{1}{\hat{\sigma}_{1}}\phi\left(\frac{z-\hat{\mu}_{1}}{\hat{\sigma}_{1}}\right)+\frac{\hat{\pi}_{\mu_{2}}}{1-\hat{\pi}}\frac{1}{\hat{\sigma}_{2}}\phi\left(\frac{z-\hat{\mu}_{2}}{\hat{\sigma}_{2}}\right)$ and $\hat{g}(z\mid h=1)=\frac{\hat{\pi}_{\mu_{3}}}{\hat{\pi}}\frac{1}{\hat{\sigma}_{3}}\phi\left(\frac{z-\hat{\mu}_{3}}{\hat{\sigma}_{3}}\right)+\frac{\hat{\pi}_{\mu_{4}}}{\hat{\pi}}\frac{1}{\hat{\sigma}_{4}}\phi\left(\frac{z-\hat{\mu}_{4}}{\hat{\sigma}_{4}}\right),$ respectively, where $\phi(\cdot)$ is the density of the standard normal distribution. The estimated marginal locFDR value for a $z$ -score $z$ is therefore $T_{marg}(z)=\frac{(1-\hat{\pi})\hat{g}(z\mid h=0)}{(1-\hat{\pi})\hat{g}(z\mid h=0)+\hat{\pi}\hat{g}(z\mid h=1)}$ .

The confirmed discoveries were identified using the discovery $z$ -vector and the validation $z$ -vector, combined by Fisher’s combining method. So the $p$ -value for a gene with values $(zd,zv)$ for the $z$ -scores in the discovery study and the validation study, respectively, is the probability that a chi-square distribution with four degrees of freedom is larger than $-2\log(\phi(zd))-2\log(\phi(zv))$ . We expect the power to be greater when basing the inference on the pooled evidence from both the primary and follow-up meta-analyses, if the genes differentially expressed in the primary meta-analyses are also differentially expressed in the follow-up meta-analyses. Applying a multiple testing procedure, which is supposed to yield a negligible amount of false positives, on these Fisher combined $p$ -values, we receive trustworthy discoveries which we label as the confirmed discoveries.

In Table 4 we see that for both up-regulation, and separately down-regulation, est-OMT-FDR has the most rejections, followed in decreasing order by est-mFDR, adaptive BH, and lastly the BH procedure. Moreover, this order of the number of rejections is retained also when restricted to the genes in the “set of confirmed discoveries” using the BH procedure at the 0.05 level on the Fisher combined $p$ -values. A more conservative definition of “set of confirmed discoveries” for which no false positives are expected, by the discoveries using the Bonferroni-Holm procedure at the 0.05 level, also retains this order of the number of rejections (not shown).

7 Discussion

In this paper, we provide the first practical approach to the problem of maximizing an objective which is linear in the decision functions, subject to FDR or pFDR control in the general two-group model. With the generic form of our formulation for finding the OMT policies in (3.7), it is also possible to consider other error criteria, e.g., FWER ( $Pr(V>0)$ ), or false discovery exceedance ( $Pr(FDP>\gamma)$ ). As with FDR control, the optimal solution will be to threshold the locFDR at a value that depends on the $K$ realized locFDR statistics. The error measures $\mathbb{E}(V)$ and mFDR result in a much simpler solution (see derivation in § C for mFDR control), where the threshold for rejection depends only on the mixture distribution. It is also possible to consider novel criteria, such as the probability of false discovery exceedance given that at least one rejections occurred, $\textrm{Pr}\left(\frac{V}{R}>\gamma\mid R>0\right)$ . Moreover, the formulation can be extended in a straightforward manner to control more than one error rate. For example, seek the OMT policy which controls for the FDR as well as for $\mathbb{E}(V)$ , thus potentially creating a powerful policy with meaningful control over the false discovery proportion in expectation without allowing for an unattractive policy which tends to reject many or very few hypotheses.

We provide efficient algorithms for computing the optimal policy for independent test statistics, as well as if the test statistics are equi-correlated or have a block dependence covariance structure. We demonstrate the large potential power gain from incorporating the dependence into the OMT policies. We expect the OMT policies to be useful in genomic applications where the dependence is known. For example, in Genome-Wide association studies (GWAS), the covariance matrix is a known banded matrix, due to linkage disequilibrium. We plan to provide efficient computational tools for the general two-group model with this type of local dependence.

Our general two-group model assumes that the hypotheses states are independently sampled. Sun and Cai, (2009) considered the setting where the underlying latent indicator variable of being null follows a homogeneous irreducible hidden markov (HMM) chain. In their setting, the test statistics conditional on the hypotheses states are independent. Deriving solutions with an HMM structure in our framework is also an interesting direction for future work.

In the two-group model, the potential gain in power from applying optimal policies with FDR or pFDR control rather than mFDR control is maintained when the parameters are estimated in our numerical experiments, but care has to be taken in proper estimation of the mixture parameters in order to avoid an unacceptable inflation of the FDR or pFDR level. In particular, it appears that the estimation of the fraction of nulls has to be conservative when the actual fraction is fairly small. Further research into estimation methods tailored towards estimated optimal policies for FDR and pFDR control is needed. In our real data examples, the most discoveries (as well as the most validated discoveries, confirmed using a validation set) were made with the estimated optimal marginal policy for FDR control. In these examples, the test statistics are not independent but the dependence is unknown and expected to be limited in range and magnitude. Further research is needed to understand when local dependence can be ignored, as well as towards the development of robust estimation methods for the mixture components.

Appendix A Proofs and additional mathematical details

Proof of Proposion 2.1

Item 1 follows straightforwardly from the explanation in the paragraph leading to the proposition.

Item 2 follows from the fact that OMT-mFDR is a single step procedure, yet OMT-pFDR is by construction the step-down procedure described in § 4. Put another way, the necessary conditions for the OMT-mFDR policy lead to the single step procedure, and the OMT-pFDR policy does not satisfy these necessary conditions. For example, for $K=2$ , let $\vec{E}(\vec{z})$ and $\vec{D}(\vec{z})$ be the OMT-mFDR and OMT-pFDR policies, respectively. Then $\{(T(z_{1}),T(z_{2})):E_{1}(\vec{z})=1\}=\{(T(z_{1}),T(z_{2})):T(z_{1})\leq c\}$ for a constant $c$ which guarantees $mFDR(\vec{E})=\alpha$ , but $\{(T(z_{1}),T(z_{2})):D_{1}(\vec{z})=1\}=\left\{(T(z_{1}),T(z_{2})):T(z_{1})\leq\frac{1+\alpha}{1+\mu}\ \textrm{or}\ T(z_{1})+T(z_{2})\leq\frac{2}{1+\mu/2}\right\}$ for a constant $\mu$ which guarantees $pFDR(\vec{D})=\alpha$ . Clearly, the symmetric difference between the sets $\{(T(z_{1}),T(z_{2})):E_{1}(\vec{z})=1\}$ and $\{(T(z_{1}),T(z_{2})):D_{1}(\vec{z})=1\}$ has positive Lebesgue measure.

For item 3, suppose by contradiction that $mFDR\leq pFDR$ for the OMT-pFDR policy. The OMT-pFDR policy is necessarily at least as powerful as the OMT-mFDR policy since the OMT-mFDR policy controls the pFDR. So the OMT-pFDR policy is optimal for mFDR control if it satisfies $mFDR\leq pFDR$ . But to achieve optimal mFDR control, a policy has to satisfy necessary conditions which lead to the single step procedure. This contradicts the fact that the OMT-pFDR policy is necessarily not a single step procedure, as shown for item 2 above.

Item 4 follows by the same reasoning as that of item 1. The OMT-FDR policy is necessarily at least as powerful as the OMT-pFDR policy since the OMT-pFDR policy controls the FDR (which is bounded above by the pFDR). Therefore, if the OMT-FDR policy controls the pFDR (since the probability of no rejections is zero), this must be the OMT-pFDR policy as well. Indeed, it is easy to see that the step-down procedures for optimal pFDR and optimal FDR control in § 4 coincide when the hypothesis with minimal locFDR is rejected with probability one.

Proof of Theorem 2.1

Given a candidate solution $\vec{D}$ , we prove the lemma by constructing an alternative solution $\vec{E}$ that complies with the condition and has no lower objective and no higher constraint than $\vec{D}$ .

For every pair of indexes $1\leq i<j\leq K$ , define:

[TABLE]

We will now examine the solution $\vec{E}$ which is equal to $\vec{D}$ everywhere, except on the set $A_{ij}$ , where it switches the value of coordinates $i,j$ :

[TABLE]

We now show the following:

For the integrated power in Eq. (3.1), $\Pi(\vec{E})\geq\Pi(\vec{D})$ . 2. 2.

For the Error constraint in Eq. (3.2), $Err(\vec{E})\leq Err(\vec{D})$ .

Therefore $\vec{E}$ is an improved solution compared to $\vec{D}$ . This can be done for all $i,j$ pairs repeatedly until $\mathbb{P}(A_{ij})=0\forall i,j$ , and we end up with $\vec{E}$ which has the desired monotonicity property and is superior to $\vec{D}$ . Since $\vec{D}^{*}$ the optimal solution cannot be improved, it must have this monotonicity property.

It remains to prove properties 1 and 2 above. For the power, we write the expression in Eq. (3.1) for $\vec{E}$ and $\vec{D}$ and subtract them:

[TABLE]

where the second equality uses the definition of $\vec{E}$ , and the inequality follows since $T_{j}(\vec{z})>T_{i}(\vec{z})$ for $\vec{z}\in A_{ij}$ .

The same idea applies to the FDR constraint:

[TABLE]

where the second and third equalities follow since the difference between the two ratios is nonzero only in the numerator and only on $A_{ij}$ .

It remains to show that $pFDR(\vec{D})-pFDR(\vec{E})\geq 0$ . This clearly follows since for $\vec{z}\in A_{ij}$ , $\vec{1}^{t}\vec{D}(\vec{z})=\vec{1}^{t}\vec{E}(\vec{z})$ , so

[TABLE]

So the denominators in $pFDR(\vec{D})$ and $pFDR(\vec{E})$ are the same, and hence

[TABLE]

Derivation of Euler-Lagrange conditions for Problem (3.8)

Our optimization problem is:

[TABLE]

We eliminate the inequality constraints, by introducing non-negative auxiliary variables, and then square those variables to also eliminate non-negativity constraints:

[TABLE]

The Euler-Lagrange (EL) necessary conditions for a solution to this optimization problem may be obtained through calculus of variations (Korn and Korn,, 2000). Let $y_{1}(x),y_{2}(x),\dots,y_{n}(x):\mathbb{R}\rightarrow\mathbb{R}$ be a set of $n$ functions and

[TABLE]

be a definite integral over fixed boundaries $x_{0},x_{F}$ . Every set of $y_{1}(x),y_{2}(x),\dots,y_{n}(x)$ which maximize or minimize (A.2) must satisfy a set of $n$ equations

[TABLE]

In addition, let

[TABLE]

be a set of $m_{1}<n$ point-wise equality constraints on $y_{1}(x),y_{2}(x),\dots,y_{n}(x)$ and

[TABLE]

be a set of $m_{2}$ integral equality constraints on $y_{1}(x),y_{2}(x),\dots,y_{n}(x)$ . Then, every set of $n$ functions $y_{1}(x),y_{2}(x),\dots,y_{n}(x)$ which maximize (A.2), subject to the constraints (A.4, A.5) must satisfy the EL equations,

[TABLE]

where

[TABLE]

The unknown functions $\lambda_{j_{1}}(x)$ and constants $\mu_{j_{2}}$ are called the Lagrange multipliers. The differential equations in (A.6) are necessary conditions for a maximum, provided that all the quantities on the left hand side of (A.6) exist and are continuous.

Hence, the set of $y_{1}(x),y_{2}(x),\dots,y_{n}(x)$ which maximize (A.2) subject to the constraints (A.4,A.5), is to be determined, together with unknown Lagrange multipliers, from (A.4,A.5,A.6).

This derivation may also be extended to a higher dimensional case, $x,y_{1}(x),y_{2}(x),\dots,y_{n}(x)\in\mathbb{R}^{d}$ , as appears in Korn and Korn, (2000). In this case the EL equations are

[TABLE]

where $y_{i,k}\triangleq\frac{\partial y_{i}}{\partial x_{k}}$ and $\Phi$ follows the same definition as in (A.7), with

[TABLE]

Therefore, the Lagrangian $\Phi$ for our optimization problem (A.1) is

[TABLE]

The necessary conditions for the minimizers of (A.1) are that the original constraints are met with equality, and additionally

$\frac{\partial\Phi}{\partial\tilde{D}_{k}(\vec{z})}=\mathbb{P}(\vec{z})\left(a_{k}(\vec{z})-\mu b_{k}(\vec{z})\right)+\lambda_{k+1}(\vec{z})-\lambda_{k}(\vec{z})=0\quad\quad\forall 1\leq k\leq K,;\vec{z}\in\mathbb{R}^{K}$ 2. 2.

$\frac{\partial\Phi}{\partial e_{k}(\vec{z})}=2e_{k}(\vec{z})\lambda_{k}(\vec{z})=0\quad\quad\forall 1\leq k\leq K+1,\;\vec{z}\in\mathbb{R}^{K}$ 3. 3.

$\frac{\partial\Phi}{\partial E}=2\mu E=0\quad\quad$

It is interesting to notice that these condition are exactly the KKT conditions for the discrete optimization case, where $\vec{z}$ is over a finite grid. Specifically, the first condition corresponds to the derivatives of the Lagrangian, while conditions (2), (3), are equivalent to the complementary slackness property.

Note also the multiplication by $\mathbb{P}(\vec{z})$ in the first condition, which plays no role (since the scale of $\lambda_{k}(\vec{z})$ is arbitrary) and is eliminated in the main text for simplicity.

Proof of Lemma 3.1

Assume that for some $\vec{z}\in\mathbb{R}^{K}$ and index $j$ we have that $0<\tilde{D}_{j}(\vec{z})<1$ . Then it is easy to see that out of the $K+1$ constraints implied by conditions (3.11)–(3.13), at least two will require $\lambda_{i}=0$ to hold: for example, if $0<\tilde{D}_{1}(\vec{z})<1$ and $\tilde{D}_{2}(\vec{z})=\ldots=\tilde{D}_{K}(\vec{z})=0$ , we will have that $\lambda_{1}(\vec{z})=\lambda_{2}(\vec{z})=0$ to maintain complementary slackness.

Assume wlog that $\lambda_{l}(\vec{z})=\lambda_{j}(\vec{z})=0$ for some $l<j$ . Now we can sum the equations between $l$ and $j-1$ in the stationarity condition (3.9):

[TABLE]

where all the $\lambda$ terms have cancelled out due to the telescopic nature of the sum, and $\lambda_{l}=\lambda_{j}=0$ .

Hence we have concluded that having any non-binary value in the optimal solution $\tilde{D}^{*}(\vec{z})$ implies

[TABLE]

which has probability zero since, by our assumption for the two-group model, $\sum_{i=l}^{j-1}\left\{a_{i}(\vec{Z})-\mu b_{i}(\vec{Z})\right\}$ is a continuous random variable.

Derivation of dual to Problem (3.8) and proof of Lemma 3.2

The result in Lemma 3.2 relies on explicit derivation of the dual to the infinite linear program (3.8) (see Anderson and Nash, (1987) for details on derivation of dual to infinite linear programs):

[TABLE]

Proof of Lemma 3.2: Feasibility of dual solution holds by construction: $\mu,\lambda$ are non-negative Largange multipliers by definition, and the EL conditions require that

[TABLE]

To calculate the dual objective, we explicitly derive the value of $\lambda^{*}_{1}(\vec{z})$ as a function of the other variables. If $\tilde{D}^{*}_{K}(\vec{z})=1$ , then $\lambda^{*}_{K+1}(\vec{z})=0$ and it is easy to see from (3.9)–(3.13) that $\lambda^{*}_{1}(\vec{z})$ is equal to

[TABLE]

Similarly, if $\tilde{D}^{*}_{j-1}(\vec{z})-\tilde{D}^{*}_{j}(\vec{z})=1$ for $j\in\{2,\ldots,K-1\}$ , then $\lambda^{*}_{j}(\vec{z})=0$ and $\lambda^{*}_{1}(\vec{z})$ is equal to

[TABLE]

It thus follows that

[TABLE]

Therefore,

[TABLE]

Therefore the dual objective is equal to the primal objective:

[TABLE]

where we have used the complementary slackness condition for the $\mu^{*}$ in the last equality.

Appendix B Derivation of the expression for $FDR(\vec{D})$ in (3.5)

By Theorem 2.1, on Q, ${\vec{1}}^{t}\vec{D}(\vec{z})=k$ if and only if $\tilde{D}_{1}=\ldots\tilde{D}_{k}=1$ and $\tilde{D}_{k+1}=\ldots=\tilde{D}_{K}=0$ , i.e., if and only if $\tilde{D}_{k}-\tilde{D}_{k+1}=1$ . Therefore:

[TABLE]

For the general two-group model, from (3.3) it thus follows that

[TABLE]

where the second equality follows from expression (B.1) derived for $\tilde{D}$ that is weakly monotone in the locFDRs.

Appendix C An alternative proof of the rejection policy for OMT with mFDR control

We shall show that the solution to the optimization problem of finding the optimal decision rule with the expected number of true rejections as the objective and the mFDR at most level $\alpha$ as the constraint, coincides with the rule of Xie et al., (2011) for the two-group model.

The constraint $mFDR\leq\alpha$ is equivalent to $\mathbb{E}(V(\vec{D}))-\mathbb{E}(R(\vec{D}))\alpha\leq 0$ , where

[TABLE]

Therefore, the linear program for maximizing the objective subject to mFDR control is (3.7) where $b_{i}(\vec{z})=(1-\alpha)-(1-T_{i}(\vec{z}))=T_{i}(\vec{z})-\alpha$ and $a_{i}(\vec{z})=1-T_{i}(\vec{z})$ .

As in the FDR proof, the EL necessary optimality conditions are:

[TABLE]

where $\mu,\lambda_{ij}(\vec{z}),\;i=1,\ldots,K,j=1,2,\;\vec{z}\in\mathbb{R}^{K}$ are non-negative Lagrange multiplies. The solution that satisfies (C.1),(C.3), and (C.4) is guaranteed to be an integer solution, since if $0<D_{i}(\vec{z})<1$ it follows that $\lambda_{i1}(\vec{z})=\lambda_{i2}(\vec{z})=0$ and therefore that $a_{i}(\vec{z})-\mu b_{i}(\vec{z})=0$ . Moreover, following steps similar to the ones in the FDR proof of Lemma 3.2, it can be shown that conditions (C.1)-(C.4) together with primal feasibility are sufficient.

Clearly, given $\mu>0$ , almost surely the rejection policy that satisfies (C.1),(C.3), and (C.4) is

[TABLE]

Therefore, all that remains is to find $\mu$ that satisfies $\int_{\mathbb{R}^{K}}\mathbb{P}(\vec{z})\sum_{i=1}^{K}D^{\mu}_{i}(\vec{z})b_{i}(\vec{z})d\vec{z}=0$ , i.e., $\mathbb{E}(V(\vec{D}^{\mu}))-\mathbb{E}(R(\vec{D}^{\mu}))\alpha=0.$

Appendix D Gene expression data analysis in the ADEPTUS database

A recently compiled high-quality curated database spanning more than 38000 gene expression profiles and more than 100 phenotypes (mainly a disease or a control label) is the ADEPTUS database Amar et al., (2018). We shall use a subset of the gene expression studies in the database to illustrate our suggested methods.

Our starting point is the one-sided $p$ -values as computed in the ADEPTUS database for the 149 gene expression studies (so down-regulation will tend to have a $p$ -value close to zero, and up-regulation will tend to have a $p$ -value close to one). We estimated the fraction of differentially expressed genes using Storey’s plug-in method with parameter $\lambda=0.05$ , as suggested in Blanchard and Roquain, (2009). We selected the studies which had an estimated proportion of signal of at least 15%. This crude selection of studies is convenient, since we expect for the vast majority of the selected studies that the procedures est-OMT-FDR and est-OMT-pFDR coincide, and therefore we only used est-OMT-FDR in our re-analysis of these datasets. Moreover, we wanted to avoid having a fraction of nulls that is too small for proper evaluation of the mixture parameters. In addition, we restricted our attention to studies that had well behaved $p$ -values, i.e. that the $z$ -score corresponding to the $p$ -values where all between -10 and 10. We were left with 20 studies to analyze, and each study had $K=10081$ genes.

We assume the $z$ scores are independently generated from a mixture of three normal densities, for up-regulation (positive expected value for the $z$ -score), down regulation (negative expected value for the $z$ -score), and for the null component which is assumed to have a standard normal distribution. As in § 6, we estimated the mixture components using the R package mixfdr by Muralidharan, (2010).

Figure 1 shows that est-OMT-FDR tends to make the most rejections, followed by adaptive BH, then est-mFDR, and finally BH. Thus, at least in terms of number of discoveries, the novel procedure with estimated mixture components by mixFDR has an advantage over the others.

Appendix E Calculating locFDR under equi-correlation

As we have seen, under any type of dependence, the problem of controlling FDR, pFDR, mFDR in the two-group model boils down to calculating the localFDRs $T_{i}$ in a manner which takes in the depdendence structure:

[TABLE]

The sum in the denominator contains $2^{K}$ terms, and the one in the numerator a subset of $2^{K-1}$ of them. For large $K$ this is not practical in the general case.

As shown in the main text (§#), when the test statistics are independent, we have a simple calculation that uses only $Z_{i}$ to calculate $T_{i}$ . In the main text we also discuss the case where the dependence is in relatively small blocks, in which case the complexity becomes exponential in block size rather than in $K,$ and linear in the number of blocks.

Here we discuss a case where all entries in $\vec{Z}$ are dependent, but the calculations can still be done efficiently with careful analysis.

Consider the equi-correlated case where

[TABLE]

where $\Sigma$ and its inverse are both equi-correlation matrices:

[TABLE]

and $\delta<0$ is the mean for the alternative hypothesis. It is easy to see that the inverse $\Sigma^{-1}$ also has similar simple structure:

We show how to calculate the denominator of Eq. (E.1) for this case in $O(K^{2})$ complexity. Writing $\mathbb{P}(\vec{Z}|\vec{h})$ explicitly, we get:

[TABLE]

Examining the three terms in the exponents we observe that the first one is independent of $\vec{h},$ and the second one depends only on the number of non-zeroes in $\vec{h}$ . Assume this number is $k=\vec{1}^{T}\vec{h}$ , then we get:

[TABLE]

where as defined above:

[TABLE]

Using these facts we can rewrite the denominator of Eq. (E.1) as:

[TABLE]

Thus, the key is being able to calculate the $K+1$ sums in the last term efficiently.

Using our $a,b$ notation we can write (assuming $k=\vec{1}^{T}\vec{h}$ ):

[TABLE]

and we can take advantage of this representation to design a simple recursive algorithm to calculate the needed sums. Denote:

[TABLE]

where $\Sigma_{L}^{-1}$ is the $L\times K$ matrix formed by taking the first $L$ rows of $\Sigma^{-1}$ , and $\vec{Z}_{L}$ denotes the first $L$ elements of $\vec{Z}$ . Thus, $S(K,k)$ is the $k^{th}$ sum in the needed calculation. Using this representation, it is easy to observe:

[TABLE]

Each calculation of $S(L,k)$ given $S(L-1,\cdot)$ and precalculated $S_{Z}$ requires $O(1)$ operations, hence calculating $S(K,k)\;,\;k=0,\ldots K$ requires $O(K^{2})$ calculations.

Taking all of this together, we conclude that the complete calculation of:

[TABLE]

can be done in $O(K^{2}),$ as all other calculations and summations except computing $S(K,k)$ are $O(K).$

For the calculation of the numerator of Eq. (E.1), we can employ similar methodology, noting:

[TABLE]

Hence we can follow the same steps as above to define $S^{(i)}(L,k),$ which evolves as $S(L,k),$ except when $L=i$ :

[TABLE]

since we do not have to handle the case $h_{i}=1.$

Consequently, with a naive calculation of each $T_{i}$ numerator separately, we can calculate all locFDR values $T_{1},\ldots,T_{K}$ in $O(K^{3})$ complexity for this case. It seems plausible that an approach for simultaneously generating $S^{(i)},i=1,\ldots,K$ could reduce the complexity to $O(K^{2}),$ but we do not currently have such an algorithm.

In our experiments we were easily able to use this approach to calculate OMT policies for this setting with $K=1000.$ With the naive exponential calculation, any $K>50$ already becomes untenable.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Amar et al., (2018) Amar, D., Vizel, A., Levy, C., and Shamir, R. (2018). Adeptus: a discovery tool for disease prediction, enrichment and network analysis based on profiles from many diseases. Bioinformatics , 34(11):1959–1961.
2Anderson and Nash, (1987) Anderson, E. and Nash, P. (1987). Linear programming in infinite-dimensional spaces: theory and applications . Wiley-Interscience series in discrete mathematics and optimization. Wiley.
3Benjamini, (2008) Benjamini, Y. (2008). Comment: Microarrays, empirical bayes and the two-groups model. Statistical Science , 23(1):23–28.
4Benjamini and Hochberg, (1995) Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate - a practical and powerful approach to multiple testing. Journal of the Royal Statisical Society series B - statistical methodology , 57(1).
5Benjamini et al., (2006) Benjamini, Y., Krieger, A., and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika , 93(3):491–507.
6Blanchard and Roquain, (2009) Blanchard, G. and Roquain, E. (2009). Adaptive false discovery rate control under independence and dependence. Journal of machine learning research , 10(8):2837–2871.
7Cai and Sun, (2017) Cai, T. T. and Sun, W. (2017). Optimal screening and discovery of sparse signals with applications to multistage high throughput studies. Journal of the Royal Statistical Society, Series b , 79(1):197–223.
8Efron, (2008) Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science , 23(1):1–22.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

1 Introduction

2 Properties of OMT policies

Proposition 2.1**.**

Proposition 2.2**.**

Proof.

Theorem 2.1**.**

3 Optimal procedures for FDR or pFDR control in the general two-group model

Lemma 3.1**.**

Lemma 3.2**.**

Theorem 3.1**.**

4 Algorithm

4.1 Calculating locFDR values under dependence

5 Numerical Examples

5.1 The independent setting

5.2 The dependent setting

5.3 The effect of estimation of the mixture components in the two-group model

6 Gene expression data analysis

7 Discussion

Appendix A Proofs and additional mathematical details

Proof of Proposion 2.1

Proof of Theorem 2.1

Derivation of Euler-Lagrange conditions for Problem (3.8)

Proof of Lemma 3.1

Derivation of dual to Problem (3.8) and proof of Lemma 3.2

Appendix B Derivation of the expression for FDR(D⃗)FDR(\vec{D})FDR(D) in (3.5)

Appendix C An alternative proof of the rejection policy for OMT with mFDR control

Appendix D Gene expression data analysis in the ADEPTUS database

Appendix E Calculating locFDR under equi-correlation

Proposition 2.1.

Proposition 2.2.

Theorem 2.1.

Lemma 3.1.

Lemma 3.2.

Theorem 3.1.

Appendix B Derivation of the expression for $FDR(\vec{D})$ in (3.5)