Estimation of distributional effects of treatment and control under   selection on observables: consistency, weak convergence, and applications

Pier Luigi Conti; Livia De Giovanni

arXiv:1904.12159·stat.ME·April 30, 2019

Estimation of distributional effects of treatment and control under selection on observables: consistency, weak convergence, and applications

Pier Luigi Conti, Livia De Giovanni

PDF

Open Access

TL;DR

This paper develops methods for estimating the distribution of potential outcomes under treatment and control, using propensity score weighting, and establishes their theoretical properties and practical applications.

Contribution

It introduces a weighted empirical process approach for distributional estimation and proves its weak convergence, enabling new nonparametric tests for treatment effects.

Findings

01

Weak convergence of the weighted empirical process to Gaussian process

02

Consistent estimation of ATE and QTE distributions

03

Finite sample properties demonstrated via simulations

Abstract

In this paper the estimation of the distribution function for potential outcomes to receiving or not receiving a treatment is studied. The approach is based on weighting observed data on the basis on estimated propensity score. A weighted version of empirical process is constructed and its weak convergence to bivariate Gaussian process is established. Results for the estimation of the Average Treatment Effect (ATE) and Quantile Treatment Effect (QTE) are obtained as by-products. Applications to the construction of nonparametric tests for the treatment effect and for the stochastic dominance of the treatment over control are considered, and their finite sample properties and merits are studied via simulation.

Tables16

Table 1. Table 1: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50 - sampling simulation results

Estimator	Parameter value	Average	Median	Standard Deviation
${\hat{θ}}_{01, m}$	0.50	0.50	0.50	0.015
${\hat{Q}}_{1, n} (0.25)$	70	70.11	70.20	0.578
${\hat{Q}}_{1, n} (0.50)$	75	75.37	75.42	0.318
${\hat{Q}}_{1, n} (0.75)$	80	80.11	80.15	0.158
${\hat{Q}}_{0, n} (0.25)$	70	70.17	70.16	0.130
${\hat{Q}}_{0, n} (0.50)$	75	75.28	75.31	0.307
${\hat{Q}}_{0, n} (0.75)$	80	80.15	80.03	0.514
$\sum_{i = 1}^{n} y_{i} w_{i, n}^{(1)} - \sum_{i = 1}^{n} y_{i} w_{i, n}^{(0)}$	0	0.07		0.411
$n^{- 1} \sum_{i = 1}^{n} y_{i} - n^{- 1} \sum_{i = 1}^{n} y_{i}$	0	5.04		0.460

Table 2. Table 2: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.500 subscript 𝜃 01 0.500 \theta_{01}=0.500

Parameter	Coverage probability	Average length
$θ_{01, n}$	0.95	0.063
$θ_{01, m}$	0.94	0.062

Table 3. Table 3: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Parameter	Coverage probability	Average length
$\underset{𝑦}{s u p} \| F_{1, n} (y) - F_{1, n} (y) \|$	0.98	0.137
$\underset{𝑦}{s u p} \| F_{0, n} (y) - F_{0, n} (y) \|$	0.98	0.138

Table 4. Table 4: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Test statistic	Rejection probability
$Δ (y)$	0.06

Table 5. Table 5: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50 - sampling simulation results

Estimator	Parameter value	Average	Median	Standard Deviation
${\hat{θ}}_{01, m}$	0.50	0.50	0.50	0.007
${\hat{Q}}_{1, n} (0.25)$	70	69.92	69.96	0.319
${\hat{Q}}_{1, n} (0.50)$	75	74.98	74.98	0.168
${\hat{Q}}_{1, n} (0.75)$	80	79.74	79.75	0.069
${\hat{Q}}_{0, n} (0.25)$	70	69.95	69.97	0.111
${\hat{Q}}_{0, n} (0.50)$	75	74.99	74.98	0.146
${\hat{Q}}_{0, n} (0.75)$	80	79.75	79.77	0.209
$\sum_{i = 1}^{n} y_{i} w_{i, n}^{(1)} - \sum_{i = 1}^{n} y_{i} w_{i, n}^{(0)}$	0	0.00		0.184
$n^{- 1} \sum_{i = 1}^{n} y_{i} - n^{- 1} \sum_{i = 1}^{n} y_{i}$	0	4.97		0.192

Table 6. Table 6: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.500 subscript 𝜃 01 0.500 \theta_{01}=0.500

Parameter	Coverage probability	Average length
$θ_{01, n}$	0.96	0.028
$θ_{01, m}$	0.95	0.027

Table 7. Table 7: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Parameter	Coverage probability	Average length
$\underset{𝑦}{s u p} \| F_{1, n} (y) - F_{1, n} (y) \|$	0.96	0.060
$\underset{𝑦}{s u p} \| F_{0, n} (y) - F_{0, n} (y) \|$	0.96	0.061

Table 8. Table 8: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Test statistic	Rejection probability
$Δ (y)$	0.05

Table 9. Table 9: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , θ 01 = 0.672 subscript 𝜃 01 0.672 \theta_{01}=0.672 - sampling simulation results

Estimator	Parameter value	Average	Median	Standard Deviation
${\hat{θ}}_{01, m}$	0.67	0.67	0.67	0.012
${\hat{Q}}_{1, n} (0.25)$	75	74.70	74.69	0.289
${\hat{Q}}_{1, n} (0.50)$	80	79.73	79.75	0.350
${\hat{Q}}_{1, n} (0.75)$	85	85.17	84.92	0.669
${\hat{Q}}_{0, n} (0.25)$	70	69.64	70.03	0.698
${\hat{Q}}_{0, n} (0.50)$	75	74.81	74.90	0.355
${\hat{Q}}_{0, n} (0.75)$	80	79.90	79.77	0.352
$\sum_{i = 1}^{n} y_{i} w_{i, n}^{(1)} - \sum_{i = 1}^{n} y_{i} w_{i, n}^{(0)}$	5	4.98		0.415
$n^{- 1} \sum_{i = 1}^{n} y_{i} - n^{- 1} \sum_{i = 1}^{n} y_{i}$	5	-0.03		0.033

Table 10. Table 10: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.672 subscript 𝜃 01 0.672 \theta_{01}=0.672

Parameter	Coverage probability	Average length
$θ_{01, n}$	0.97	0.051
$θ_{01, m}$	0.96	0.049

Table 11. Table 11: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.672 subscript 𝜃 01 0.672 \theta_{01}=0.672

Parameter	Coverage probability	Average length
$\underset{𝑦}{s u p} \| F_{1, n} (y) - F_{1, n} (y) \|$	0.97	0.138
$\underset{𝑦}{s u p} \| F_{0, n} (y) - F_{0, n} (y) \|$	0.96	0.137

Table 12. Table 12: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 1000 𝑛 1000 n=1000 , M = 1000 𝑀 1000 M=1000 , m = 100 𝑚 100 m=100 , θ 01 = 0.672 subscript 𝜃 01 0.672 \theta_{01}=0.672

Test statistic	Rejection probability
$Δ (y)$	0.00

Table 13. Table 13: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , θ 01 = 0.67 subscript 𝜃 01 0.67 \theta_{01}=0.67 - sampling simulation results

Estimator	Parameter value	Average	Median	Standard Deviation
${\hat{θ}}_{01, m}$	0.67	0.67	0.67	0.005
${\hat{Q}}_{1, n} (0.25)$	75	74.85	74.88	0.145
${\hat{Q}}_{1, n} (0.50)$	80	79.72	79.76	0.137
${\hat{Q}}_{1, n} (0.75)$	85	84.89	84.84	0.243
${\hat{Q}}_{0, n} (0.25)$	70	69.91	70.00	0.257
${\hat{Q}}_{0, n} (0.50)$	75	74.94	74.97	0.160
${\hat{Q}}_{0, n} (0.75)$	80	79.76	79.76	0.090
$\sum_{i = 1}^{n} y_{i} w_{i, n}^{(1)} - \sum_{i = 1}^{n} y_{i} w_{i, n}^{(0)}$	5	4.98		0.174
$n^{- 1} \sum_{i = 1}^{n} y_{i} - n^{- 1} \sum_{i = 1}^{n} y_{i}$	5	-0.04		0.430

Table 14. Table 14: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.67 subscript 𝜃 01 0.67 \theta_{01}=0.67

Parameter	Coverage probability	Average length
$θ_{01, n}$	0.96	0.023
$θ_{01, m}$	0.95	0.022

Table 15. Table 15: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Parameter	Coverage probability	Average length
$\underset{𝑦}{s u p} \| F_{1, n} (y) - F_{1, n} (y) \|$	0.97	0.060
$\underset{𝑦}{s u p} \| F_{0, n} (y) - F_{0, n} (y) \|$	0.97	0.061

Table 16. Table 16: Scenario ( i ) 𝑖 (i) - N = 1000 𝑁 1000 N=1000 , n = 5000 𝑛 5000 n=5000 , M = 1000 𝑀 1000 M=1000 , m = 500 𝑚 500 m=500 , θ 01 = 0.50 subscript 𝜃 01 0.50 \theta_{01}=0.50

Test statistic	Rejection probability
$Δ (y)$	0.00

Equations308

p_{1} (x) = p (x), p_{0} (x) = 1 - p (x) .

p_{1} (x) = p (x), p_{0} (x) = 1 - p (x) .

E [\frac{1}{p _{j} ( x )} I_{(T = j)} I_{(Y \leq y)}]

E [\frac{1}{p _{j} ( x )} I_{(T = j)} I_{(Y \leq y)}]

=

=

=

\displaystyle\widehat{\mbox{\boldmath$\pi$}}_{K}={\mathrm{argmax}}\frac{1}{n}\sum_{i=1}^{n}\left\{T_{i}\log\left(L(\mbox{\boldmath$H$}_{K}(x)^{T}\mbox{\boldmath$\pi$}_{K})\right)+(1-T_{i})\log\left(L(1-\mbox{\boldmath$H$}_{K}(x)^{T}\mbox{\boldmath$\pi$}_{K})\right)\right\}.

\displaystyle\widehat{\mbox{\boldmath$\pi$}}_{K}={\mathrm{argmax}}\frac{1}{n}\sum_{i=1}^{n}\left\{T_{i}\log\left(L(\mbox{\boldmath$H$}_{K}(x)^{T}\mbox{\boldmath$\pi$}_{K})\right)+(1-T_{i})\log\left(L(1-\mbox{\boldmath$H$}_{K}(x)^{T}\mbox{\boldmath$\pi$}_{K})\right)\right\}.

x sup ∣ p_{n} (x) - p (x) ∣ \to p 0 as n \to \infty.

x sup ∣ p_{n} (x) - p (x) ∣ \to p 0 as n \to \infty.

p_{1, n} (x) = p_{n} (x), p_{0, n} (x) = 1 - p_{n} (x) .

p_{1, n} (x) = p_{n} (x), p_{0, n} (x) = 1 - p_{n} (x) .

F_{1, n} (y) = i = 1 \sum n w_{i, n}^{(1)} I_{(Y_{i} \leq y)}, F_{0, n} (y) = i = 1 \sum n w_{i, n}^{(0)} I_{(Y_{i} \leq y)}

F_{1, n} (y) = i = 1 \sum n w_{i, n}^{(1)} I_{(Y_{i} \leq y)}, F_{0, n} (y) = i = 1 \sum n w_{i, n}^{(0)} I_{(Y_{i} \leq y)}

w_{i, n}^{(j)} = \frac{I _{(T_{i} = j)} / p _{j, n} ( x _{i} )}{\sum _{k = 1}^{n} I _{(T_{k} = 1)} / p _{j, n} ( x _{k} )}, j = 1, 0; i = 1, \dots, n .

w_{i, n}^{(j)} = \frac{I _{(T_{i} = j)} / p _{j, n} ( x _{i} )}{\sum _{k = 1}^{n} I _{(T_{k} = 1)} / p _{j, n} ( x _{k} )}, j = 1, 0; i = 1, \dots, n .

F_{1, n}^{H T} (y) = \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = 1)}}{p _{1, n} ( x _{i} )} I_{(Y_{i} \leq y)}, F_{0, n}^{H T} (y) = \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = 0)}}{p _{0, n} ( x _{i} )} I_{(Y_{i} \leq y)} .

F_{1, n}^{H T} (y) = \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = 1)}}{p _{1, n} ( x _{i} )} I_{(Y_{i} \leq y)}, F_{0, n}^{H T} (y) = \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = 0)}}{p _{0, n} ( x _{i} )} I_{(Y_{i} \leq y)} .

x sup F_{1, n} (y) - F_{1} (y) \to p 0, x sup F_{0, n} (y) - F_{0} (y) \to p 0 as n \to \infty.

x sup F_{1, n} (y) - F_{1} (y) \to p 0, x sup F_{0, n} (y) - F_{0} (y) \to p 0 as n \to \infty.

\displaystyle W_{n}(y)=\left[\begin{array}[]{cc}W_{1,n}(y)\\ W_{0,n}(y)\end{array}\right]=\left[\begin{array}[]{cc}\sqrt{n}(\widehat{F}_{1,n}(y)-F_{1}(y))\\ \sqrt{n}(\widehat{F}_{0,n}(y)-F_{0}(y))\end{array}\right],\;\;y\in\mathbb{R}

\displaystyle W_{n}(y)=\left[\begin{array}[]{cc}W_{1,n}(y)\\ W_{0,n}(y)\end{array}\right]=\left[\begin{array}[]{cc}\sqrt{n}(\widehat{F}_{1,n}(y)-F_{1}(y))\\ \sqrt{n}(\widehat{F}_{0,n}(y)-F_{0}(y))\end{array}\right],\;\;y\in\mathbb{R}

n (F_{j, n} (y) - F_{j} (y)) = (\frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = j)}}{p _{j, n} ( x _{i} )})^{- 1} \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = j)}}{p _{j, n} ( x _{i} )} (I_{(Y_{i} \leq y)} - F_{j} (y)), j = 1, 0

n (F_{j, n} (y) - F_{j} (y)) = (\frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = j)}}{p _{j, n} ( x _{i} )})^{- 1} \frac{1}{n} i = 1 \sum n \frac{I _{(T_{i} = j)}}{p _{j, n} ( x _{i} )} (I_{(Y_{i} \leq y)} - F_{j} (y)), j = 1, 0

\displaystyle\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=1)}}{\widehat{p}_{1,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{1}(y))\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=0)}}{\widehat{p}_{0,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{0}(y))\end{array}\right],\;\;y\in\mathbb{R}.

\displaystyle\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=1)}}{\widehat{p}_{1,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{1}(y))\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=0)}}{\widehat{p}_{0,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{0}(y))\end{array}\right],\;\;y\in\mathbb{R}.

\displaystyle\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=1)}}{\widehat{p}_{1,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{1}(y))\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=0)}}{\widehat{p}_{0,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{0}(y))\end{array}\right]=\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{1,i}(y)\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{0,i}(y)\end{array}\right]+o_{p}(1),\;\;y\in\mathbb{R}.

\displaystyle\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=1)}}{\widehat{p}_{1,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{1}(y))\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{I_{(T_{i}=0)}}{\widehat{p}_{0,n}(x_{i})}(I_{(Y_{i}\leq y)}-F_{0}(y))\end{array}\right]=\left[\begin{array}[]{cc}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{1,i}(y)\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{0,i}(y)\end{array}\right]+o_{p}(1),\;\;y\in\mathbb{R}.

Z_{j, i} (y) = (\frac{I _{(T_{i} = j)}}{p _{j} ( x _{i} )} I_{(Y_{i} \leq y)} - F_{j} (y)) - \frac{F _{j} ( y ∣ x _{i} )}{p _{j} ( x _{i} )} (I_{(T_{i} = j)} - p_{j} (x_{i})), j = 1, 0; i = 1, \dots, n .

Z_{j, i} (y) = (\frac{I _{(T_{i} = j)}}{p _{j} ( x _{i} )} I_{(Y_{i} \leq y)} - F_{j} (y)) - \frac{F _{j} ( y ∣ x _{i} )}{p _{j} ( x _{i} )} (I_{(T_{i} = j)} - p_{j} (x_{i})), j = 1, 0; i = 1, \dots, n .

C (y, t) = E [W (y) \otimes W (t)] = [C_{11} (y, t) C_{01} (y, t) C_{10} (y, t) C_{00} (y, t)]

C (y, t) = E [W (y) \otimes W (t)] = [C_{11} (y, t) C_{01} (y, t) C_{10} (y, t) C_{00} (y, t)]

C_{j j} (y, t)

C_{j j} (y, t)

C_{10} (y, t)

=

C_{01} (y, t)

W_{j n}^{H T} (y) = n (F_{j, n}^{H T} (y) - F_{j} (y)), j = 1, 0.

W_{j n}^{H T} (y) = n (F_{j, n}^{H T} (y) - F_{j} (y)), j = 1, 0.

θ = θ (F_{1}, F_{0}) : l^{\infty} (R)^{2} \to E

θ = θ (F_{1}, F_{0}) : l^{\infty} (R)^{2} \to E

θ_{(F_{1}, F_{0})}^{'} : C (\overline{R}) \times C (\overline{R}) \to E

θ_{(F_{1}, F_{0})}^{'} : C (\overline{R}) \times C (\overline{R}) \to E

\frac{θ ( ( F _{1} , F _{0} ) + t h _{t} ) - θ ( F _{1} , F _{0} )}{t} - θ_{(F_{1}, F_{0})}^{'} (h)_{E} \to 0 as t ↓ 0, \forall h_{t} \to h .

\frac{θ ( ( F _{1} , F _{0} ) + t h _{t} ) - θ ( F _{1} , F _{0} )}{t} - θ_{(F_{1}, F_{0})}^{'} (h)_{E} \to 0 as t ↓ 0, \forall h_{t} \to h .

n (θ (F_{1}, F_{0}) - θ (F_{1}, F_{0})) \to d θ_{(F_{1}, F_{0})}^{'} (W) .

n (θ (F_{1}, F_{0}) - θ (F_{1}, F_{0})) \to d θ_{(F_{1}, F_{0})}^{'} (W) .

σ_{θ}^{2} = E [θ_{(F_{1}, F_{0})}^{'} (W)^{2}] .

σ_{θ}^{2} = E [θ_{(F_{1}, F_{0})}^{'} (W)^{2}] .

n (θ_{n} - θ) \to d N (0, σ_{θ}^{2}) as n \to \infty

n (θ_{n} - θ) \to d N (0, σ_{θ}^{2}) as n \to \infty

R_{n, m} (u) = (m n)^{- 1} l = 1 \sum (m n) I_{(m (θ_{m, l} - θ_{n}) \leq u)} .

R_{n, m} (u) = (m n)^{- 1} l = 1 \sum (m n) I_{(m (θ_{m, l} - θ_{n}) \leq u)} .

R_{n, m} (u) \to p Φ (\frac{u}{σ _{θ}}) as n, m \to \infty

R_{n, m} (u) \to p Φ (\frac{u}{σ _{θ}}) as n, m \to \infty

R_{n, m} (u) = \frac{1}{M} l = 1 \sum M I_{(m (θ_{m, l} - θ_{n}) \leq u)} .

R_{n, m} (u) = \frac{1}{M} l = 1 \sum M I_{(m (θ_{m, l} - θ_{n}) \leq u)} .

R_{n, m}^{- 1} (u) = in f {u : R_{n, m} (u) \geq p}

R_{n, m}^{- 1} (u) = in f {u : R_{n, m} (u) \geq p}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials

Full text

Estimation of distributional effects of treatment and control

under selection on observables:

consistency, weak convergence, and applications

Pier Luigi Conti111Pier Luigi Conti. Dipartimento di Scienze Statistiche; Sapienza Università di Roma; P.le A. Moro, 5; 00185 Roma; Italy. E-mail [email protected]

Livia De Giovanni222Livia De Giovanni. Dipartimento di Scienze Polituiche; LUISS Guido Carli; Viale Romania, 32; 00197 Roma; Italy. E-mail [email protected]

Abstract

In this paper the estimation of the distribution function for potential outcomes to receiving or not receiving a treatment is studied. The approach is based on weighting observed data on the basis on the estimated propensity score. A weighted version of the empirical process is constructed and its weak convergence to bivariate Gaussian process is established. Results for the estimation of the Average Treatment Effect (ATE) and Quantile Treatment Effect (QTE) are obtained as by-products. Applications to the construction of nonparametric tests for the treatment effect and for the stochastic dominance of the treatment over control are considered, and their finite sample properties and merits are studied via simulation.

Keywords. Potential outcomes, Propensity score, causality, Empirical processes, weak convergence, nonparametric tests, stochastic dominance.

1 Introduction

The evaluation of the possible effects of a treatment on an outcome plays a central role in theoretical as well as applied statistical and econometrical literature; cfr. the excellent review papers by [3] and [12]. The main quantity of interest, traditionally, is the average effect of the treatment on outcome, or better the difference between the expected valued of outcomes for treated and control (untreated) subjects, i.e. $ATE$ (Average Treatment Effect). Another quantity of interest is the effects of treatment on outcome quantiles, which is summarized by $QTE$ (Quantile Treatment Effect). The main source of difficulty is that data are usually observational, so that the estimation of the treatment effect by simply comparing outcomes for treated vs. control subjects is prone to a relevant source of bias: receiving a treatment is not a “purely random” event, and there could be relevant differences between treated and control subjects. This motivates the need to account for confounding covariates.

In the literature, several different techniques have been proposed to estimate $ATE$ , under various assumptions (see [3], [12] and references therein). As far as $QTE$ is concerned, cfr. the paper by [9]. The problem of evaluating possible differences in the distribution function of potential outcomes with binary instrumental variables is studied in [1] via a Kolmogorv-Smirnov type test.

In the present paper we essentially focus on evaluating the possible effects of the treatment on the whole outcome probability distribution. The starting point is to use outcome weighting similar to those introduced in [11] and [9]. Using this approach, estimates of the distribution function (d.f.) for treated and control subjects will be obtained. Such estimators essentially play a role similar to the empirical d.f. in nonparametric statistics. It will be shown that the resulting “empirical processes” weakly converge to an appropriate Gaussian process. Although it is non a Brownian bridge, it possesses several properties similar to the Brownian bridge (continuity of trajectories, etc.). These theoretical results are applied to the construction of confidence bands for the outcome distribution under treatment and under control, as well as to construct a new statistical test to compare treated and untreated subjects. In a sense, such a test is a version of the classical Wilcoxon-Mann-Whitney test for two groups comparison. Its main merit is to capture the possible difference between treated and untreated subjects even when $ATE$ is equal to zero. Another application of interest will be the construction of a test for stochastic dominance of treatment w.r.t. control, which is of interest, for instance, in programme evaluation exercises ([15]), welfare outcome, etc..

The paper is organized as follows. In Section 2 the problem is described. In Section 3.2 the main asymptotic large sample results are provided, and in Section 4 approximations based on subsampling are considered. Particularizations to $ATE$ and $QTE$ are given in Section 5. Section 6 is devoted to the construction of confidence bands for the d.f. of outcomes, for both treated and untreated subjects. In Section 7 a Wilcoxon-type statistic to test for treatment effect of the d.f of outcomes in introduced, and in Section 8 an elementary test for first-order stochastic dominance of treated vs. untreated is studied. The finite sample performance of the proposed methodologies is studied via Monte Carlo simulation in Section 9.

2 The problem

Let $Y$ be an outcome of interest, observed on a sample of subjects. Some of the sample units are treated with an appropriate treatment (treated group); the other sample units are untreated (control group). If $T$ denotes the treatment indicator variable, then whenever $T=1$ , $Y_{1}$ is observed; otherwise, if $T=0$ , $Y_{0}$ is observed. Here $Y_{1}$ and $Y_{0}$ are the potential outcomes due to receiving and not receiving the treatment, respectively. The observed outcome is then equal to $Y=TY_{(1)}+(1-T)Y_{(0)}$ . In the sequel, $F_{1}(y)=P(Y_{(1)}\leq y)$ will denote the distribution function (d.f.) of $Y_{(1)}$ , and $F_{0}(y)=P(Y_{(0)}\leq y)$ the d.f. of $Y_{(0)}$ .

As already said in the introduction, receiving a treatment is not a “purely random” event, as in experimental framework. On the contrary, there could be relevant differences between treated and untreated subjects, due to the presence of confounding covariates. In the sequel, we will denote by $X$ the (random) vector of relevant covariates, that is assumed to be observed.

In order to get consistent estimates, identification restrictions are necessary. The relevant restriction assumed in the sequel is selection of treatment is based on observable variables: given a set of observed covariates, assignment either to the treatment group or to the control group is random. Formally speaking, let $p(x)=P(T=1|X=x)$ be the conditional probability of receiving the treatment given covariates $X$ ; it is termed propensity score. The marginal probability of being treated, $P(T=1)$ , is equal to $E[p(X)]$ .

In the sequel, our main assumption is that the strong ignorability conditions (cfr. [18]) are fulfilled. In more detail, consider next the joint distribution of ( $Y_{(1)},\,Y_{(0)},\,T,\,X$ ), and denote by $\mathcal{X}$ the support of $X$ . The following assumptions are assumed to hold.

(i)

Unconfoundedness (cfr. [19]): given $X$ , $(Y_{(1)},\,Y_{(0)})$ are jointly independent of $T$ : $(Y_{(1)},\,Y_{(0)})\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}T|X$ .

(ii)

The support of $X$ , $\mathcal{X}$ is a compact subset of $\mathbb{R}^{l}$ .

(iii)

Common support: there exists $\delta>0$ for which $\delta\leq p(x)\leq 1-\delta\;\forall\,x\in\mathcal{X}$ , so that $\underset{x}{\inf}\,p(x)\geq\delta$ , $\underset{x}{\sup}\,p(x)\leq 1-\delta$ .

Assumption $(i)$ is also known as Conditional Independence Assumption ( $CIA$ ).

For the sake of simplicity, we will use in the sequel the notation

[TABLE]

From the above assumptions, the basic relationships

[TABLE]

are obtained.

The Average Treatment Effect (ATE) is defined as $\tau=E[Y_{(1)}]-E[Y_{(0)}]$ . The estimation of ATE is a problem of primary importance in the literature, and several different approaches have been proposed ([3] and references therein). Another parameter of interest in the Quantile Treatment Effect (QTE), which is the difference between quantiles of $F_{1}$ and $F_{0}$ : $F_{1}^{-1}(p)-F_{0}^{-1}(p)$ , with $0<p<1$ ; cfr. [9]. In particular, when $p=1/2$ it reduces to the Median Treatment Effect.

As already said in the introductory section, in the present paper we concentrate on the estimation of the d.f.s $F_{1}(y)$ , $F_{0}(y)$ under treatment and control, respectively. As special cases, the results in [11] and [9] will be obtained.

3 Estimation of $F_{1},F_{0}$

3.1 Basics

The basic approach to the estimation of $F_{1}$ , $F_{0}$ follows, in principle, the ideas developed in [11] to estimate ATE. First of all, the propensity score $p(x)$ is estimated by a sieve estimator $\widehat{p}_{n}(x)$ , say; cfr. [11], [9]. Let $\mbox{\boldmath$ H $}_{K}(x)=\{H_{k,j}(x)\}$ , $j=1,\,\dots,\,K$ be a $K$ -dimensional vector of polynomials in $x\in\mathcal{X}$ , such that

S1.

$\mbox{\boldmath$ H $}_{K};\mathcal{X}\rightarrow\mathbb{R}^{K}$ ;

S2.

$H_{k,1}(x)=1$ ;

S3.

$\mbox{\boldmath$ H $}_{K}$ includes all polynomials up to order $n$ whenever $K>(n+1)^{r}$ , with $K=K(n)\rightarrow\infty$ as $n\rightarrow\infty$ .

The propensity score is approximated by a linear combination of $H_{k,j}(x)$ on a logit scale, with coefficients estimated by maximizing a pseudo-likelihood. More formally, if $L(z)=1/(1+e^{-x})$ , then $\widehat{p}_{n}(x)=L(\mbox{\boldmath$ H $}_{K}(x)^{T}\widehat{\mbox{\boldmath$ \pi $}}_{K})$ , where the $K$ -dimensional vector $\widehat{\mbox{\boldmath$ \pi $}}_{K}$ is estimated by maximum likelihood method:

[TABLE]

In the sequel, the following result will be widely used.

Theorem 1.

Assume that S1 - S3 are fulfilled, and that $p(x)$ is continuously differentiable of order $s\geq 7l$ , with $l={\mathrm{dim}}(\mathcal{X})$ . If $K=n^{\nu}$ , with $1/(4(s/l-1))<\nu<1/9$ , then

[TABLE]

Proof. See [11].

Again, for notational simplicity, and similarly to $(\ref{eq:simp_01})$ , define:

[TABLE]

In order to estimate $F_{1}$ and $F_{0}$ , the following “Hájek - type” estimators are considered:

[TABLE]

where

[TABLE]

It is immediate to see that $(\ref{eq:e5})$ are proper d.f.s, i.e. they are bona fide estimators.

As alternative estimators of $F_{1}$ , $F_{0}$ , the following “Horvitz-Thompson - type” estimators could be considered:

[TABLE]

We will mainly concentrate on $(\ref{eq:e5})$ for two reasons. First of all, $(\ref{eq:e5b})$ are not proper d.f.s, because $\widehat{F}^{HT}_{1,n}(+\infty)\neq 1$ , $\widehat{F}^{HT}_{0,n}(+\infty)\neq 1$ with positive probability. In the second place, as it will be seen in the sequel, $(\ref{eq:e5b})$ are asymptotically equivalent to $(\ref{eq:e5})$ .

3.2 Basic asymptotic results

The goal of the present section is to study the asymptotic, large sample, properties of estimators $(\ref{eq:e5})$ . Our first result is a Glivenko - Cantelli type result, showing the uniform consistency (in probability) of $F_{1,n}(y)$ , $F_{0,n}(y)$ .

Proposition 1.

Assume that the conditions of Th. 1 are fulfilled. Then:

[TABLE]

Proof. See Appendix.

Next step consists in studying the limit, large sample distribution of the above estimators. Define first the stochastic process

[TABLE]

The bivariate stochastic process $W_{n}(\cdot)$ $(\ref{eq:emp_proc})$ essentially plays the same role as the empirical process in classical non-parametric statistics, with a complication due to the presence of $\widehat{F}_{1,n}(y)$ , $\widehat{F}_{0,n}(y)$ instead of the usual empirical distribution function.

The weak convergence of $W_{n}(\cdot)$ can be proved similarly to the classical empirical process, with modifications. In the first place, from

[TABLE]

and from Lemma 2 , it is seen that the limiting distribution of $W_{n}(y)$ , if it exists, coincides with the limiting distribution of

[TABLE]

In the second place, by repeating verbatim the arguments in Th. 1 in [11], and [10], with $I_{(Y_{i}\leq y)}$ instead of $Y_{i}$ and $F_{j}(y|x)=P(Y_{(j)}\leq y|x)$ instead of $E[Y_{(j)}\leq y|x]$ , it is seen that, if $K=n^{\nu}$ , with $1/(4(s/l-1))<\nu<1/9$ , then the relationship

[TABLE]

holds, where

[TABLE]

The term $o_{p}(1)$ appearing in $(\ref{eq:emp_proc_interm2})$ depends on $y$ , and, as it appears by using the bounds in [10], convergence in probability to zero (or better, to the vector $[0,\,0]^{T}$ ) holds uniformly over compact sets of $y$ s. Hence, in order to prove that the sequence of stochastic processes $(\ref{eq:emp_proc})$ converges weakly to a limit process, it is enough to prove that $(\ref{eq:emp_proc_interm2})$ converges weakly to a limiting process.

Proposition 2.

Assume that the conditions of Th. 1 are fulfilled, and that $F_{1}(y)$ , $F_{1}(y|x)$ , $F_{0}(y)$ , $F_{0}(y|x)$ are continuous. Then, the sequence of stochastic processes $(\ref{eq:emp_proc})$ converges weakly, as $n$ goes to infinity, to a Gaussian process $W(y)=[W_{1}(y),\,W_{0}(y)]^{T}$ with null mean function ( $E[W_{j}(y)]=0$ , $j=1,\,0$ ) and covariance kernel:

[TABLE]

where:

[TABLE]

Weak convergence takes place in the set $l_{2}^{\infty}(\mathbb{R})$ of bounded functions $\mathbb{R}\mapsto\mathbb{R}^{2}$ equipped with the sup-norm (if $f=(f_{1},\,f_{0})$ ) $\|f\|=\sup_{y}|f_{1}(y)|+\sup_{y}|f_{0}(y)|$ .

Proof. See Appendix.

Due to the continuity of $F_{1}$ , $F_{0}$ , the weak convergence of Proposition 2 also holds in the space $D[-\infty,+\infty]^{2}$ of $\mathbb{R}^{2}$ -valued càdlàg functions equipped with the Skorokhod topology.

Consider now the Horvitz-Thompson estimators $(\ref{eq:e5b})$ , and define:

[TABLE]

From the proof of Proposition 2, it appears that the sequence of stochastic processes $W^{HT}_{n}(\cdot)=[W^{HT}_{1n}(\cdot),\,W^{HT}_{0n}(\cdot)]^{T}$ converges weakly to the same Gaussian limiting process $W(\cdot)=[W_{1}(\cdot),\,W_{0}(\cdot)]^{T}$ that appears in Proposition 2. Hence, the Horvitz-Thompson estimators $(\ref{eq:e5b})$ are asymptotically equivalent to the Hájek estimators $(\ref{eq:e5})$ .

As well known, in classical nonparametric statistics the empirical process converges weakly to a Brownian bridge, on the scale of the population ditribution function. The limiting process $W(\cdot)$ in Proposition 2 is not a Browinian bridge, of course, although it is a Gaussian process. However, it shares with the Brownian bridge an important property: it possesses trajectories that are a.s. continuous.

Proposition 3.

If $F_{0}$ and $F_{1}$ are continuous, the limiting process $W(\cdot)=[W_{1}(\cdot),\,W_{0}(\cdot)]$ possesses trajectories that are continuous with probability 1.

Proof. See Appendix.

3.3 Differentiable functionals

The result of Proposition 2 can be immediately extended to general Hadamard differentiable functionals of $(F_{1},\,F_{0})$ , again assuming the continuity of $F_{0}$ , $F_{1}$ . Consider a general functional:

[TABLE]

where $l^{\infty}(\mathbb{R})^{2}$ is equipped with the $sup$ -norm metric and $\mathbb{E}$ is a normed space equipped with a norm $\|\cdot\|_{\mathbb{E}}$ . As seen in Proposition 3, the limiting process $W(\cdot)=(W_{1}(\cdot),\,W_{0}(\cdot))$ concentrates on $C(\mathbb{\overline{R}})^{2}$ , where $C(\mathbb{\overline{R}})$ is the set of continuous functions on the extended real line $\overline{\mathbb{R}}$ . Note that functions in $C(\mathbb{\overline{R}})$ are bounded.

The functional $\theta$ is Hadamard differantiable at $(F_{1},\,F_{0})$ tangentially to $C(\mathbb{\overline{R}})^{2}$ if there exists a linear application

[TABLE]

such that:

[TABLE]

Using Theorem 20.8 in [20], we then have:

[TABLE]

In general, since $\theta^{\prime}_{(F_{1},\,F_{0})}(W)$ is a linear functional of a Gaussian process, it is a Gaussian process, as well. In particular, if $\theta$ is a real-valued functional, then $\theta^{\prime}_{(F_{1},\,F_{0})}(W)$ has a Gaussian distribution with zero expectation and variance

[TABLE]

For the sake of simplicity, let $\widehat{\theta}_{n}$ be equal to $\theta(\widehat{F}_{1},\,\widehat{F}_{0})$ . The above result can be rewritten as

[TABLE]

where the asymptotic variance $\sigma^{2}_{\theta}$ is given by $(\ref{eq:asymp_var})$ .

4 Subsampling approximation

Consider a functional $\theta=\theta(F_{1},\,F_{0})$ . In order to construct a confidence interval on the basis of $(\ref{eq:asympt_t})$ , a consistent estimate of the asymptotic variance $\sigma_{\theta}^{2}$ $(\ref{eq:asymp_var})$ is necessary. Unfortunately, apart a few cases, this is not simple, because $\sigma_{\theta}^{2}$ could depend on $F_{1}$ , $F_{0}$ in a complicate way, and a direct estimation could not be possible. This is the case, for instance, of quantiles, that will be dealt with in next section. Here we briefly present a simple approach based on subsampling.

Define $A_{i}=(X_{i},T_{i},Y_{i})$ , $i=1,\,\dots,\,n$ , and consider all the ${n}\choose{m}$ subsamples of size $m$ of $(A_{1},\,\dots,\,A_{n})$ . Let further $\widehat{\theta}_{m,l}$ be the statistic $\widehat{\theta}(\cdot)$ computed for the $l$ -th subsample of size $m$ . Next, consider then the empirical distribution function of the ${n}\choose{m}$ quantities $\sqrt{m}(\widehat{\theta}_{m,l}-\widehat{\theta}_{n})$ . In symbols:

[TABLE]

If:

U1.

$\sqrt{n}(\widehat{\theta}_{n}-{\theta})\overset{d}{\rightarrow}N(0,\sigma^{2}_{\theta})$ ;

U2.

$m$ depends on $n$ in such a way that $m\rightarrow\infty$ , $\frac{m}{n}\rightarrow 0\;\;{\mathrm{as}}\;n\rightarrow\infty$ ;

then, using Th. 2.1 in [17], we have

[TABLE]

where $\Phi$ is the distribution function of the Gaussian $N(0,1)$ distribution. The convergence in (31) is uniform in $u$ .

Relationship $(\ref{eq:e61})$ tells us that $Pr(\sqrt{n}(\widehat{\theta}_{n}-{\theta})\leq u)$ can be (uniformly) approximated by $R_{n,m}(u)$ , as $n$ and $m$ get large. From the continuity and strict monotonicity of $\Phi$ , it follows that the empirical quantile $R^{-1}_{n,m}(p)=\inf\{u:\;R_{n,m}(u)\geq 1\}$ converges in probability to the quantile of order $p$ of the distribution $N(0,\sigma^{2}_{\theta})$ $\forall p\in(0,1)$ .

The number of subsamples of size $m$ , ${n}\choose{m}$ in $(\ref{eq:e60})$ can be very large, and then $R_{n,m}$ could be difficult to be computed. In this case a “stochastic” version of $R_{n,m}$ can be considered according to the following steps.

Select $M$ independent subsamples of size $m$ from $(A_{1},\dots,A_{n})$ .

2.

Compute the corresponding values $\widehat{\theta}_{m,1},\,\dots,\,\widehat{\theta}_{m,M}$ of the statistic $\widehat{\theta}$ .

3.

Compute of the corresponding empirical distribution function:

[TABLE]

It can be easily verified that if $M\rightarrow\infty$ , $n,m\rightarrow\infty$ and $\frac{m}{n}\rightarrow 0$ , then $\widehat{R}_{n,m}(u)$ has the same limiting behaviour as $R_{n,m}(u)$ . These results can be used to obtain confidence intervals for $\theta$ and for testing statistical hypotheses via inversion of confidence intervals. In more detail, let

[TABLE]

be the $p$ th quantile of $\widehat{R}_{n,m}$ . It is easy to show that the interval:

[TABLE]

is confidence interval for $\theta$ of asymptotic level $1-\alpha$ .

The confidence interval $(\ref{eq:e62})$ can be also used for testing the hypothesis:

[TABLE]

If $\theta_{0}$ is in the confidence interval, then $H_{0}$ is accepted, otherwise it is rejected. Clearly, this is a test of asymptotic significance level $1-\alpha$ .

5 Average and Quantile Treatment Effect

The results obtained so far allow one to re-obtain, as special cases, results previously obtained by [11] and [9]. They are presented below.

5.1 Average Treatment Effect

The Average Treatment Effect (ATE, for short) is defined as:

[TABLE]

In the sequel, we will assume that $E[Y_{(1)}^{2}]$ and $E[Y_{(0)}^{2}]$ are both finite. As an estimator of $\tau$ , consider

[TABLE]

where the weights $w^{(j)}_{i,n}$ , $j=1,\,0$ are given by $(\ref{eq:e6})$ .

As it appears from $(\ref{eq:tau})$ , $\tau$ is a linear functional of $(F_{1},\,F_{0})$ and hence Hadamard differentiable. An integration by parts shows that the asymptotic distribution of $\widehat{\tau}$ coincides with that

[TABLE]

that turns out to normal with zero mean and variance

[TABLE]

It is not difficult to see that the estimator $\widehat{\tau}$ $(\ref{eq:tauhat})$ is asymptotically equivalent to that introduced in [11].

5.2 Quantiles and Quantile Treatment Effect

Let $Q_{j}(p)=F_{j}^{-1}(p)=\inf\{y:\;F_{1}(y)\geq p\}$ , $0<p<1$ be the quantile of order $p$ of $F_{j}$ , $j=1,\,0$ . In the sequel, we will assume that $Q_{1}(p)$ , $Q_{0}(p)$ are in the common support of $F_{1}$ , $F_{0}$ . Furthermore, we will denote by $supp(F_{j})$ the support of $F_{j}$ , $j=1,\,0$ .

Suppose that $F_{1}$ , $F_{0}$ are continuous with positive density functions $f_{1}$ , $f_{0}$ , respectively:

[TABLE]

As a consequence of the above assumption, $F_{j}$ is strictly monotonic (in its support).

Consider now $p_{1},p_{2}$ ( $0<p_{1}<p_{2}<1$ ) such that $Q_{1}(p_{1}),Q_{0}(p_{1}),Q_{1}(p_{2}),Q_{0}(p_{2})$ lie in the common support of $F_{1}$ , $F_{0}$ . It is intuitive to estimate the quantile $Q_{j}(p)$ by its “empirical counterpart”

[TABLE]

Let now $\mathbb{D}$ be the set of the restrictions of the distribution functions in $\mathbb{R}$ to $[a,b]$ , and let $D[a,b]$ be the set of càdlàg functions in $[a,b]$ . From [20], it is seen that the map $G\longmapsto G^{-1}$ (from $\mathbb{D}\subseteq D[Q(p_{1}),Q(p_{2})]$ onto $l^{\infty}(0,1))$ is Hadamard differentiable at $(F_{1},\,F_{0})$ tangentially to $C[a,b]$ with derivative:

[TABLE]

Using then Th. 20.8 in [20], (cfr. [7] for an equivalent approach), the process

[TABLE]

converges weakly as $n\rightarrow\infty$ (on $l^{\infty}(p_{1},p_{2})$ equipped with the $sup$ -norm) to a Gaussian process $Z(p)=[Z_{1}(p),Z_{2}(p)]^{\prime}$ defined as:

[TABLE]

The stochastic process $(\ref{eq:limit_quantile})$ is a Gaussian process with zero mean function and covariance kernel:

[TABLE]

Note that $Z(\cdot)\overset{d}{=}-\mathbb{Z}(\cdot)$ due to the symmetry of the Gaussian distribution.

In [9] the difference between corresponding quantiles:

[TABLE]

is considered. It is known as Quantile Treatment Effect (QTE, for short). From $(\ref{eq:e36})$ it is intuitive to estimate $\varphi(p)$ by

[TABLE]

The estimator $(\ref{eq:qte_estim})$ is asymptotically equivalent to the estimator of QTE defined in [9]. In fact, from $(\ref{eq:limit_quantile})$ it appears that

[TABLE]

tends in distribution, as $n$ goes to infinity, to a Gaussian distribution with zero mean and variance:

[TABLE]

which coincides with the asymptotic variance of the estimator of QTE used in [9].

6 Confidence bands for $F_{1}$ and $F_{0}$

The aim of the present section is to construct a confidence bandwidth for $F_{1}$ , $F_{0}$ , assuming again that they are continuous d.f.s.. As seen in Proposition 3, under this assumption the process $W(\cdot)=[W_{0}(\cdot),\,W_{1}(\cdot)]^{\prime}$ has a.s. continuous trajectories. Furthermore:

[TABLE]

In other words, the trajectories of $W(\cdot)$ are continuous and bounded with probability 1. From now on, we will also assume that the cross-covariance matrix $C(y,\,t)=E\bigl{[}W(y)\otimes W(t)\bigr{]}$ is such that $C(y,\,y)$ is a positive-definite matrix, for every real $y$ . Under these conditions it is possible to show ([14]) that the functional: $\sup_{y}|W_{j}(y)|$ can only have an atom at the point

[TABLE]

and has absolutely continuous distribution on $(0,\,+\infty)$ . On the other hand, $V(W_{j}(y))=0$ only when $y\rightarrow\pm\infty$ , and, from Th. 8.1 in [8] it follows that $\sup_{|y|\leq M}|W_{j}(y)|$ has absolutely continuous distribution in $(0,\,+\infty)$ , for every positive $M$ . Hence

[TABLE]

which proves that the distribution of $\sup_{y}\left|W_{j}(y)\right|$ has no atom at [math]. In other terms, $\sup_{y}\left|W_{j}(y)\right|$ has absolutely continuous distribution on $(0,\,+\infty)$ .

The starting point to construct a confidence band of asymptotic level $1-\alpha$ for $F_{j}(\cdot)$ consists in considering the Kolmogorov statistic:

[TABLE]

From Propositions 2, 3, we obtain

[TABLE]

Let $d_{j,1-\alpha}$ be the ${1-\alpha}$ quantile of the distribution of $\sup|W_{j}(y)|$ . As a consequence of the absolute continuity of $\sup|W_{j}(y)|$ , there is a unique $d_{j,1-\alpha}$ satisfying:

[TABLE]

The quantile $d_{j,1-\alpha}$ depends on unknown quantities. It can be estimated by subsampling. Using the notation introduced in Section 4, define

[TABLE]

The subsampling procedure can be shortly described as follows.

Select $M$ independent subsamples of size $m$ from $\{A_{i}=(X_{i},T_{i},Y_{i}),\>i=1,\,\dots,\,n\}$ . 2. 2.

Compute the values:

[TABLE]

. 3. 3.

Compute the empirical distribution function:

[TABLE] 4. 4.

Compute the quantile:

[TABLE]

Now, it is easy to see that:

[TABLE]

From the absolute continuity of the distribution of $\sup_{y}\left|W_{j}(y)\right|$ , it also follows that:

[TABLE]

tends in probability to $d_{j,1-\alpha}$ . In symbols:

[TABLE]

Finally, from $(\ref{eq:conv_kolm})$ we may conclude that

[TABLE]

so that the region

[TABLE]

is a confidence bandwidth for $F_{j}(\cdot)$ of asymptotic level $1-\alpha$ .

7 Testing for the presence of a treatment effect: two (sub)sample Wilcoxon test

7.1 Wilcoxon type statistic

In nonparametric statistics, a problem of considerable relevance consists in testing for the possible difference between two samples. Among several proposals, the two-sample Wilcoxon (or Wilcoxon-Mann-Whitney) test plays a central role in applications, mainly because of its properties. The goal of the present section is to propose a Wilcoxon type statistic to test for the possible difference between the (sub)sample of treated subjects and the (sub)sample of untreated subjects. In other terms, we aim at developing a Wilcoxon type statistic to test for the possible difference between treated and untreated subjects, i.e. for the possible presence of a treatment effect.

From now on, we will assume $F_{0}$ and $F_{1}$ are both continuous. As in the classical Wilcoxon two-sample test, in order to measure the difference between the distributions of $Y_{(1)}$ and $Y_{(0)}$ , we consider

[TABLE]

The parameter $\theta_{01}$ $(\ref{eq:e39})$ possesses a natural interpretation, because it is equal to the probability that a treated subject possesses a $y$ -value greater than the $y$ -value for an independent, untreated subject. A few properties of $\theta_{01}$ are listed below.

$\theta_{01}$ depends only on the marginal d.f.s $F_{0}$ , $F_{1}$ (not on the way $Y_{(0)}$ , $Y_{(1)}$ are associated in the same subject). 2. 2)

If $F_{0}=F_{1}$ then $\theta_{01}=\frac{1}{2}$ ; 3. 3)

Using $\theta_{01}$ is equivalent to use $\theta_{10}=\int F_{1}(y)\,dF_{0}(y)$ , as it it seen by an integration by parts. 4. 4)

If $F_{1}(y)\leq F_{0}(y)$ $\forall\,y\in\mathbb{R}$ , i.e. if $Y_{(1)}$ is stochastically larger than $Y_{(0)}$ , then:

[TABLE]

The Wilcoxon type statistic we consider here is obtained in two steps, essentially by a plug-in approach.

Step 1.

Estimation of the marginal d.f.s $F_{1}$ , $F_{0}$ :

[TABLE] 2. Step 2.

Estimation of $\theta_{01}$ :

[TABLE]

Note that $w^{(1)}_{i,n}w^{(0)}_{k,n}\neq 0$ if and only if (iff) $(I_{(T_{i}=1)}=1)\land(I_{(T_{k}=0)}=1)$ , i.e. iff $i$ is treated and $k$ is untreated. This essentially shows that $\widehat{\theta}_{01}$ is based on the comparison treated/untreated.

The limiting distribution of the statistic $(\ref{eq:e41})$ is obtained as a consequence of Proposition 2.

Proposition 4.

Assume that the conditions of Proposition 2 are fulfilled. Then

[TABLE]

where

[TABLE]

and

[TABLE]

Proof. See Appendix.

7.2 Variance estimation

The asymptotic variance $V$ appearing in $(\ref{eq:e55})$ contains unknown terms, that can be consistently estimated on the basis of sample data. In particular, the estimation of $\gamma_{01}(x)=E[I_{(T=1)}p(x)^{-1}F_{0}(Y)|x]$ can be simply developed by considering the regression of

[TABLE]

on $x_{i}$ , $i=1,\,\dots,\,n$ , and to estimate the regression function by a method ensuring consistency (e.g. local polynomials, Nadaraya-Watson kernel regression, spline). The resulting estimator $\widehat{\gamma}_{01,n}(x)$ is uniformly consistent on compact sets of $x$ s under few regularity conditions. In the same way, $\gamma_{10}(x)$ can be consistently estimated by $\widehat{\gamma}_{10,n}(x)$ , say. As a consequence the term $V_{x}(\gamma_{10}(x)-\gamma_{01}(x))$ can be estimated by:

[TABLE]

Note that as an alternative estimator, one could consider:

[TABLE]

In the second place, we have to estimate

[TABLE]

The term $E_{x}[p(x)^{-1}\gamma_{01}(x)^{2}]$ can be estimated with

[TABLE]

The term:

[TABLE]

can be estimated by means of a non parametric regression of:

[TABLE]

with respect to $x_{i}$ s. The resulting estimator $\widehat{M}_{01,n}(x)$ , say, is consistent under few conditions. In the same way, an estimator $\widehat{M}_{10,n}(x)$ of

[TABLE]

can be obtained.

The asymptotic variance of $\widehat{\theta}_{10,n}$ can be finally estimated by:

[TABLE]

7.3 Testing the equality of $F_{1}$ and $F_{0}$ via Wilcoxon type statistic

A test for the equality of $F_{1}$ and $F_{0}$ can be constructed via the statistic $\widehat{\theta}_{01,n}$ $(\ref{eq:e41})$ . As already seen, when $F_{1}$ and $F_{0}$ coincide, $\theta_{01}$ is equal to $1/2$ . Hence, the idea is to construct a test for the hypotheses problem

[TABLE]

On the basis of Proposition 4, and the variance estimator $(\ref{eq:e59})$ , the region

[TABLE]

(where $z_{\frac{\alpha}{2}}$ is the $(1-\frac{\alpha}{2})$ quantile of the standard Normal distribution) is an acceptance region of asymptotic significance level $\alpha$ .

Alternatively, one could approximate the quantiles of the distribution of $\widehat{\theta}_{01,n}$ by subsampling, as outlined in Section 4. Using the notation introduced for subsampling, it is seen that the acceptance region

[TABLE]

is a confidence interval for $\theta_{01}$ of asymptotic level $1-\alpha$ . Hence, the test consisting in rejecting $H_{0}$ whenever the interval $(\ref{eq:conf_int_wilcox_subsamp})$ does not contain $1/2$ , possesses asymptotic significance level $\alpha$ .

8 Testing for stochastic dominance

In evaluating the effect of a treatment, it is sometimes of interest to test wether the treatment itself has an effect on the whole distribution function of $Y$ , i.e. wether the treatment improves the behaviour of the whole d.f. of $Y$ . Various forms of stochastic dominance are discussed in [16], [2]. In particular, in the present section we will focus on testing for first-order stochastic dominance. The d.f. $F_{1}$ first-order stochastically dominates $F_{0}$ if $F_{1}(y)\leq F_{0}(y)$ $\forall\,y\in\mathbb{R}$ . Our main goal is to construct a test for the (uni-directional) hypotheses

[TABLE]

where $\Delta(y)=F_{1}(y)-F_{0}(y)$ .

In econometrics and statistics, there is an extensive amount of literature on testing for stochastic dominance, since the papers by [2], [5]. In [15] a Kolmogorov-Smirnov type test is proposed, and a method to construct critical values based on subsampling is proposed. For further bibliographic reference, and a deep analysis of contributions to testing for stochastic dominance, cfr. the recent paper by [6].

In the present paper, we confine ourselves to a simple, intuitive procedure to test for uni-directional dominance. A simple idea to construct a test for the above hypotheses problem is to invert a confidence region for $\Delta(\cdot)$ . The null hypothesis $H_{0}$ is rejected whenever the confidence region has empty intersection with $H_{0}$ . More formally, the test procedure we consider here is defined as follows.

(i)

Compute a confidence region for $\Delta(\cdot)$ of (at least asymptotic) level $1-\alpha$ ;

(ii)

Reject $H_{0}$ if the confidence region for $\Delta(\cdot)$ and $H_{0}$ are disjoint, that is if for at least a real $y$ the region has lower bound greater than zero.

From now on, we will assume that both $F_{0}$ , $F_{1}$ are continuous d.f.s. Using the arguments in Section 6, it is possible to see that the r.v.

[TABLE]

has absolutely continuous distribution, with $P\left(\sup_{y}\left(W_{1}(y)-W_{0}(y)\right)\geq 0\right)=1$ . Hence, there exists a single $d_{1-\alpha}$ such that

[TABLE]

The quantile $d_{1-\alpha}$ can be estimated by subsampling, as outlined in Section 4. Define

[TABLE]

A subsampling procedure to estimate $d_{1-\alpha}$ is described below.

Select $M$ independent subsamples of size $m$ from the sample of $(X_{i},T_{i},Y_{i})$ s, $i=1,\,\dots,\,n$ . 2. 2.

Compute the subsample statistics

[TABLE] 3. 3.

Compute the corresponding empirical d.f.

[TABLE] 4. 4.

Compute the corresponding quantile

[TABLE]

The arguments in Section 6 show that

[TABLE]

Hence, the asymptotically exact approximation

[TABLE]

holds. As a consequence, the region

[TABLE]

is a confidence region for $\Delta(\cdot)$ with asymptotic level $1-\alpha$ . The null hypothesis $H_{0}$ is rejected whenever:

[TABLE]

The performance of the testing procedure developed so far will be evaluated by simulation in Section 9.

9 A simulation study

The goal of the present section is to study by simulation the performance of the proposed methods for finite sample sizes. In particular, estimation of $F_{j}$ s and related hypotheses tests are studied under two scenarios: $(i)$ there is no treatment effect, i.e. $F_{1}$ coincides with $F_{0}$ ; $(ii)$ there is treatment effect, i.e. $F_{1}\neq F_{0}$ .

$N=1000$ replications with samples sizes $n=1000$ and $n=5000$ have been generated by Monte Carlo simulation. The propensity score has been estimated via the estimator considered in Th. 1; the term $K$ has been chosen through least squares cross-validation. As far as subsample approximation is concerned, $M=1000$ subsamples of size $m=100$ ( $m=500$ ) have been drawn by simple random sampling from each of the $N=1000$ original samples of size $n=1000$ ( $5000)$ .

In scenario $(i)$ (absence of treatment effect) the potential outcome $Y_{(j)}$ is specified as

[TABLE]

where $X$ has a Bernoulli distribution with success probability $1/2$ ( $X\sim Be(1/2))$ and $U_{j}$ has a uniform distribution in the interval $[-10,\,10]$ ( $U_{j}\sim U(-10,\,10)$ ). The r.v.s $U_{1}$ , $U_{0}$ are mutually independent. Clearly, $\theta_{01}=1/2$ , $E[Y_{(0)}]=E[Y_{(1)}]=75$ , and $ATE=0$ .

The exact distribution function of $Y_{(j)}$ is

[TABLE]

The d.f. $F_{j}$ $(\ref{eq:df_noeffect})$ , and the corresponding density function $f_{j}$ , are depicted in Fig. 1.

The propensity score, in this case, is

[TABLE]

Furthermore we have $E[Y|T=0]=72.5$ and $E[Y|T=1]=77.5$ , so that $E[Y|T=1]-E[Y|T=0]=5.0$ even if $ATE=0$ . This is clearly due to the confounding effect of $X$ .

In Table 1 ( $N=1000$ , $n=1000$ ) and Table 5 ( $N=1000$ , $n=5000$ ) average, median and standard deviation of $\theta_{01}$ , of $\widehat{Q}_{j,n}(0.25)$ , $\widehat{Q}_{j,n}(0.50)$ , and $\widehat{Q}_{j,n}(0.75)$ , $j=0,1$ are reported. The quantities are also reported for the estimator $\widehat{\tau}=\sum_{i=1}^{n}y_{i}w^{(1)}_{i,n}-\sum_{i=1}^{n}y_{i}w^{(0)}_{i,n}$ of $ATE$ and for the “naive” mean difference between treated and untreated i.e. $n^{-1}\sum_{i=1}^{n}y_{i}-n^{-1}\sum_{i=1}^{n}y_{i}$ .

In Tables 2-4 ( $N=1000$ , $n=1000$ , $M=1000$ , $m=100$ ) and Tables 6-8 ( $N=1000$ , $n=5000$ , $M=1000$ , $m=500$ ) the 95% coverage probability and average length of confidence intervals for the Wilcoxon-type statistic $\hat{\theta}_{01,m}$ obtained via sampling and subsampling and for confidence bands for $F_{1}(y),F_{0}(y)$ and the percentage of rejection of the null hypothesis for the test of stochastic dominance are reported.

The results indicate that the Wilcoxon type statistic $\hat{\theta}_{01,m}$ and the estimated quantiles $\widehat{Q}_{j,n}(p)$ perform well according to unbiasedness and dispersion. The sampling standard error of the Wilcoxon type statistic tends to be close to its theoretical one. The estimated ATE $\widehat{\tau}=\sum_{i=1}^{n}y_{i}w^{(1)}_{i,n}-\sum_{i=1}^{n}y_{i}w^{(0)}_{i,n}$ is equal to its “true value” (Tables 1 and 5). The coverage probabilities of the confidence intervals are close to the nominal level 95% (Tables 2-3 and 6-7). Finally, the percentage of rejection of the null hypothesis for the test of stochastic dominance is close to 0.05, being true the null hypothesis of no treatment effect in scenario $(i)$ i.e. $F_{1}=F_{0}$ (Tables 4 and 8).

In scenario $(ii)$ (presence of treatment effect), the potential outcome $Y_{(0)}$ is specified as in $(\ref{eq:noeffect})$ with $j=0$ . The potential outcome $Y_{(1)}$ is specified as

[TABLE]

where $X$ has a Bernoulli distribution $X\sim Be(0.5)$ and $U_{0}$ , $U_{1}$ have a Uniform distribution $U_{1}\sim U[-10;10]$ . The r.v.s $X$ , $U_{0}$ , $U_{1}$ are mutually independent.

The exact distribution function of $Y_{(1)}$ is reported below

[TABLE]

and depicted in Fig. 2.

In scenario $(ii)$ , we have $\theta_{01}=0.67$ , $E[Y_{(0)}]=75$ , $E[Y_{(1)}]=80$ , and then $ATE=5$ . Furthermore, $F_{1}$ stochastically dominates $F_{0}$ .

The propensity score is

[TABLE]

so that $E[Y|T=0]=77.5$ and $E[Y|T=1]=77.5$ even if $ATE\neq 0$ . As in scenario $(i)$ , this is due to the confounding effect of $X$ .

In Table 9 ( $N=1000$ , $n=1000$ ) and Table 13 ( $N=1000$ , $n=5000$ ) average, median and standard deviation of $\theta_{01}$ , of $\widehat{Q}_{j,n}(0.25)$ , $\widehat{Q}_{j,n}(0.50)$ , and $\widehat{Q}_{j,n}(0.75)$ , $j=0,1$ are reported. The quantities are also reported for the estimator $\widehat{\tau}=\sum_{i=1}^{n}y_{i}w^{(1)}_{i,n}-\sum_{i=1}^{n}y_{i}w^{(0)}_{i,n}$ of $ATE$ and for the “naive” mean difference between treated and untreated i.e. $n^{-1}\sum_{i=1}^{n}y_{i}-n^{-1}\sum_{i=1}^{n}y_{i}$ .

In Tables 10-12 ( $N=1000$ , $n=1000$ , $M=1000$ , $m=100$ ) and Tables 14-16 ( $N=1000$ , $n=5000$ , $M=1000$ , $m=500$ ) the 95% coverage probability and average length of confidence intervals for the Wilcoxon-type statistic $\hat{\theta}_{01,m}$ obtained via sampling and subsampling and for confidence bands for $F_{1}(y),F_{0}(y)$ and the percentage of rejection of the null hypothesis for the test of stochastic dominance are reported.

The results indicate that the Wilcoxon type statistic $\hat{\theta}_{01,m}$ and the estimated quantiles $\widehat{Q}_{j,n}(p)$ perform well according to unbiasedness and dispersion. The sampling standard error of the Wilcoxon type statistic tends to be close to its theoretical one. The estimated ATE $\widehat{\tau}=\sum_{i=1}^{n}y_{i}w^{(1)}_{i,n}-\sum_{i=1}^{n}y_{i}w^{(0)}_{i,n}$ is equal to its “true value” (Tables 9 and 13). The coverage probabilities of the confidence intervals are close to the nominal level 95% (Tables 10-11 and 14-15). Finally, the percentage of rejection of the null hypothesis for the test of stochastic dominance is close to 0.05, being true the null hypothesis of no treatment effect. As in scenario $(ii)$ i.e. $F_{1}$ stochastically dominates $F_{0}$ the rejection probability is smaller than in in scenario $(i)$ (Tables 12 and 16).

Appendix - Technical Lemmas and proofs

Lemma 1.

Under the assumptions of Th. 1:

[TABLE]

Proof of Lemma 1.

Take an arbitrary $0<\epsilon<1$ . Since $p_{1}(x)=p(x)$ , $\widehat{p}_{1,n}(x)=\widehat{p}_{n}(x)$ , we may write

[TABLE]

Since $(\ref{eq:e4})$ holds for every positive $\epsilon$ small enough, the lemma is proved as $j=1$ . The case $j=0$ is similar. ∎

Lemma 2.

Under the assumptions of Th. 1:

[TABLE]

Proof of Lemma 2.

Consider the case $j=1$ . First of all, we have

[TABLE]

Next, by Lemma 1 it is easy to see that

[TABLE]

Furthermore, from the Strong Law of Large Numbers for sequences of i.i.d. r.v.s it is seen that

[TABLE]

as $n\rightarrow\infty$ . From $(\ref{eq:e9})$ and $(\ref{eq:e10})$ the first convergence in $(\ref{eq:e8})$ follows. Convergence in the case $j=0$ is proved in a similar way. ∎

Lemma 3.

Consider the “pseudo-estimator” of $F_{j}(y)$ :

[TABLE]

Under the assumptions of Th. 1:

[TABLE]

Proof of Lemma 3.

Consider first the case $j=1$ . From

[TABLE]

it is seen that

[TABLE]

Proof immediately follows from Lemmas 2, 3. The case $j=0$ is similar. ∎

Lemma 4.

Consider again the “pseudo-estimators” $(\ref{eq:e11})$ . Under the assumptions of Lemma 4:

[TABLE]

Proof of Lemma 4.

The result can be shown by standard arguments. Consider first the case $j=1$ . From the Strong Law of Large Numbers for i.i.d. r.v.s, we have:

[TABLE]

Moreover, on the basis of the properties of $F_{1}(y)$ (monotone non decreasing, continuous to the left, with total variation equal to 1), for every positive $\epsilon$ there exists a partition of $\mathbb{R}$

[TABLE]

such that

[TABLE]

For each $z_{j}<y<z_{j+1}$ it is then:

[TABLE]

for all $j=0,\,1,\,\dots,\,k-1$ , and this implies that

[TABLE]

Moreover, for every $z_{j}\leq y<z_{j+1}$ it is seen that

[TABLE]

and similarly:

[TABLE]

From inequalities $(\ref{eq:e15})$ , $(\ref{eq:e16})$ it follows that

[TABLE]

As $n\rightarrow\infty$ , the Strong Law of Large Numbers implies that

[TABLE]

and since $\epsilon>0$ can be made arbitrarily small, conclusion $(\ref{eq:e19})$ follows. The case $j=0$ is dealt with similarly. ∎

Proof of Proposition 1.

Immediate consequence of Lemmas 3, 4. ∎

Proof of Proposition 2.

Using $(\ref{eq:emp_proc_interm2})$ and the uniform boundedness on compact sets of $y$ s of the $o_{p}(1)$ term, it is enough to prove that the sequence of stochastic processes

[TABLE]

converges weakly to the Gaussian process $W(\cdot)$ . Observing that $E[\widetilde{W}_{1n}(y)]=E[\widetilde{W}_{0n}(y)]=0$ , and using Theorem 2.11.1 in [21] (p. 206), we have to prove point-wise convergence of covariance functions and asymptotic equicontinuity.

1. Convergence of covariance. Consider first the term $C(\widetilde{W}_{1n}(y),\,\widetilde{W}_{1n}(t))$ . Since $Z_{1,i}$ s are i.i.d. r.v.s, and taking into account that $p_{1}(x)=p(x)$ , we may write

[TABLE]

and similarly

[TABLE]

As far as the cross-covariance terms are concerned, it is immediate to see that $C_{01}(y,\,t)=C_{10}(t,\,y)$ . Furthermore:

[TABLE]

and this ends the “covariance part” of the proof.

2. Asymptotic equicontinuity. Consider the i.i.d. r.v.s $(\ref{eq:def-z})$ , and suppose $y<t$ . Then:

[TABLE]

A similar result is obtained as $t<y$ , as well as when $j=0$ , so that inequalities:

[TABLE]

hold true.

Since $F_{j}(y)$ is continuous (uniformly, being monotonic and bounded), from

[TABLE]

it follows that, for every positive $\eta$ :

[TABLE]

Next, define the (random) pseudometric:

[TABLE]

From the Strong Law of Large Numbers it is seen that

[TABLE]

with $c=(1+\delta^{-1})$ .

Denote now by $N(\epsilon,\mathbb{R},d_{n})$ the smallest number of intervals of $[y,\,t]$ that cover the real line, and such that $d_{n}(t,y)<\epsilon$ . By (98) it follows that, with probability 1, for $n$ large enough,

[TABLE]

Hence, with probability 1, the number $N(\epsilon,\mathbb{R},d_{n})$ is bounded by $\frac{K}{\epsilon}$ , $K$ being an appropriate constant. As a consequence, with probability 1, for $n$ large enough, we have:

[TABLE]

In view of Theorem 2.11.1 in [21] (p. 206), this completes the proof. ∎

Proof of Proposition 3.

Let $Q_{j}(u)=F_{j}^{-1}(u)=\inf\{y:\;F_{j}(y)\geq u\}$ , $j=1,\,0$ . Then, $W_{j}(\cdot)$ possesses continuous trajectories almost surely if $B_{j}(u)=W_{1}(Q(u))$ possesses continuous trajectories almost surely. From the inequality (consequence of of proof of Proposition $\ref{weak-conv})$ :

[TABLE]

$c$ being an appropriate constant, it follows that

[TABLE]

The continuity of the trajectories of $B_{j}(\cdot)$ follows from $(\ref{eq:ineq-lead})$ and formula $(6)$ in [13]. ∎

Proof of Proposition 4.

First of all, using an integration by parts we have

[TABLE]

and hence

[TABLE]

where $W_{j,n}(y)=\sqrt{n}(\widehat{F}_{j,n}(y)-F_{j}(y))$ , $j=1$ , [math].

Now, if $F_{0}(y),F_{1}(y)$ are continuous, the limiting process $W=[W_{1},\,W_{0}]^{\prime}$ possesses trajectories that are continuous (and bounded) with probability 1, so that it is concentrated on $C(\overline{\mathbb{R}})^{2}$ , that is separable and complete if equipped with the $sup$ -norm. Using then the Skorokhod Representation Theorem (cfr. [4], p. 70), there exist processes $\widetilde{W}_{n}=[\widetilde{W}_{1,n},\,\widetilde{W}_{0,n}]^{\prime}$ , $n\geq 1$ , and $\widetilde{W}=[\widetilde{W}_{1},\,\widetilde{W}_{0}]^{\prime}$ , defined on a probability space $(\widetilde{\Omega},\,\widetilde{\mathcal{F}},\,\widetilde{P})$ such that

[TABLE]

and

[TABLE]

where the symbol $\overset{d}{=}$ denotes equality in distribution.

From $(\ref{eq:equal_distr_skor})$ and $(\ref{eq:e44})$ , the relationship

[TABLE]

follows.

The terms appearing in the r.h.s. of $(\ref{eq:same_distrib})$ can be handled separately. First of all, we have

[TABLE]

and since

[TABLE]

we easily obtain

[TABLE]

and similarly

[TABLE]

Finally, for every integer $n$ , $n^{-1/2}\tilde{W}_{1,n}(y)$ is a bounded variation function, with total variation $\leq 2$ , a.s.- $\widetilde{P}$ , and since the trajectories of the process $\widetilde{W}_{1}$ are continuous and bounded we may write

[TABLE]

Relationship $(\ref{eq:convmis0})$ the signed measure induced by $n^{-1/2}\widetilde{W}_{1,n}$ converges weakly to a measure identically equal to zero. Hence:

[TABLE]

where the term $(a)$ goes to zero according to the Helly-Bray theorem ( $\widetilde{W}_{0}$ is continuous and bounded a.s. $-\widetilde{P}$ ), and the term $(b)$ goes to zero according to the Skorokhod Representation Theorem.

From $(\ref{eq:conv1_wilcox})$ , $(\ref{eq:conv2_wilcox})$ , and $(\ref{eq:e49})$ it follows that:

[TABLE]

which is equivalent to:

[TABLE]

The r.h.s. of $(\ref{eq:e51})$ is a linear functional of a Gaussian process with continuous and bounded trajectories, so that it possesses Gaussian distribution with zero expectation and variance

[TABLE]

where

[TABLE]

The terms $V_{1}$ - $V_{3}$ in $(\ref{eq:V1})$ - $(\ref{eq:V3})$ can be written more compactly. Using the quantities $\gamma_{10}(x)$ , $\gamma_{01}(x)$ defined in $(\ref{eq:def_gamma})$ , it is not difficult to see that

[TABLE]

In the same way, it is seen that:

[TABLE]

and

[TABLE]

From $(\ref{eq:e53})$ - $(\ref{eq:e54})$ , $(\ref{eq:e55})$ easily follows. ∎

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abadie, A. (2002). Bootstrap tests for distributional treatment effects in instrumental variable models. Journal of the American Statistical Association , 97 (457), 284-292.
2[2] Anderson, G. (1996). Nonparametric Tests of Stochastic Dominance in Income Distribution. Econometrica , 64 , 1183-1193.
3[3] Athey, S. and Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives , 31 (2), 3-32.
4[4] Billingsley, P. (1999). Convergence of Probability Measures 2nd Ed. Wiley, New York.
5[5] Davidson, R. S. and Duclos, J. Y. (2000). Statistical inference for stochastic dominance and for the measurement of poverty and inequality. Econometrica , 68 , 1435-1464.
6[6] Donald, S. G. and Hsu, Y. C. (2016). Improving the Power of Tests of Stochastic Dominance. Econometric Reviews , 35 , 553-58.
7[7] Doss, H. and Gill, R. D. (1992). An elementary approach to weak convergence for quantile processes, with applications to censored survival data. Journal of the American Statistical Association , 87 , 869-877.
8[8] Dudley, R. M. (1973). Sample Functions of the Gaussian Process. The Annals of Probability , 1 , 66-103.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

1 Introduction

2 The problem

3 Estimation of F1,F0F_{1},F_{0}F1​,F0​

3.1 Basics

Theorem 1**.**

3.2 Basic asymptotic results

Proposition 1**.**

Proposition 2**.**

Proposition 3**.**

3.3 Differentiable functionals

4 Subsampling approximation

5 Average and Quantile Treatment Effect

5.1 Average Treatment Effect

5.2 Quantiles and Quantile Treatment Effect

6 Confidence bands for F1F_{1}F1​ and F0F_{0}F0​

7 Testing for the presence of a treatment effect: two (sub)sample Wilcoxon test

7.1 Wilcoxon type statistic

Proposition 4**.**

7.2 Variance estimation

7.3 Testing the equality of F1F_{1}F1​ and F0F_{0}F0​ via Wilcoxon type statistic

8 Testing for stochastic dominance

9 A simulation study

Lemma 1**.**

Proof of Lemma 1.

Lemma 2**.**

Proof of Lemma 2.

Lemma 3**.**

Proof of Lemma 3.

Lemma 4**.**

Proof of Lemma 4.

Proof of Proposition 1.

Proof of Proposition 2.

Proof of Proposition 3.

Proof of Proposition 4.

3 Estimation of $F_{1},F_{0}$

Theorem 1.

Proposition 1.

Proposition 2.

Proposition 3.

6 Confidence bands for $F_{1}$ and $F_{0}$

Proposition 4.

7.3 Testing the equality of $F_{1}$ and $F_{0}$ via Wilcoxon type statistic

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.