Estimation of distributional effects of treatment and control under selection on observables: consistency, weak convergence, and applications
Pier Luigi Conti, Livia De Giovanni

TL;DR
This paper develops methods for estimating the distribution of potential outcomes under treatment and control, using propensity score weighting, and establishes their theoretical properties and practical applications.
Contribution
It introduces a weighted empirical process approach for distributional estimation and proves its weak convergence, enabling new nonparametric tests for treatment effects.
Findings
Weak convergence of the weighted empirical process to Gaussian process
Consistent estimation of ATE and QTE distributions
Finite sample properties demonstrated via simulations
Abstract
In this paper the estimation of the distribution function for potential outcomes to receiving or not receiving a treatment is studied. The approach is based on weighting observed data on the basis on estimated propensity score. A weighted version of empirical process is constructed and its weak convergence to bivariate Gaussian process is established. Results for the estimation of the Average Treatment Effect (ATE) and Quantile Treatment Effect (QTE) are obtained as by-products. Applications to the construction of nonparametric tests for the treatment effect and for the stochastic dominance of the treatment over control are considered, and their finite sample properties and merits are studied via simulation.
| Estimator | Parameter value | Average | Median | Standard Deviation |
|---|---|---|---|---|
| 0.50 | 0.50 | 0.50 | 0.015 | |
| 70 | 70.11 | 70.20 | 0.578 | |
| 75 | 75.37 | 75.42 | 0.318 | |
| 80 | 80.11 | 80.15 | 0.158 | |
| 70 | 70.17 | 70.16 | 0.130 | |
| 75 | 75.28 | 75.31 | 0.307 | |
| 80 | 80.15 | 80.03 | 0.514 | |
| 0 | 0.07 | 0.411 | ||
| 0 | 5.04 | 0.460 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.95 | 0.063 | |
| 0.94 | 0.062 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.98 | 0.137 | |
| 0.98 | 0.138 |
| Test statistic | Rejection probability |
|---|---|
| 0.06 |
| Estimator | Parameter value | Average | Median | Standard Deviation |
|---|---|---|---|---|
| 0.50 | 0.50 | 0.50 | 0.007 | |
| 70 | 69.92 | 69.96 | 0.319 | |
| 75 | 74.98 | 74.98 | 0.168 | |
| 80 | 79.74 | 79.75 | 0.069 | |
| 70 | 69.95 | 69.97 | 0.111 | |
| 75 | 74.99 | 74.98 | 0.146 | |
| 80 | 79.75 | 79.77 | 0.209 | |
| 0 | 0.00 | 0.184 | ||
| 0 | 4.97 | 0.192 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.96 | 0.028 | |
| 0.95 | 0.027 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.96 | 0.060 | |
| 0.96 | 0.061 |
| Test statistic | Rejection probability |
|---|---|
| 0.05 |
| Estimator | Parameter value | Average | Median | Standard Deviation |
|---|---|---|---|---|
| 0.67 | 0.67 | 0.67 | 0.012 | |
| 75 | 74.70 | 74.69 | 0.289 | |
| 80 | 79.73 | 79.75 | 0.350 | |
| 85 | 85.17 | 84.92 | 0.669 | |
| 70 | 69.64 | 70.03 | 0.698 | |
| 75 | 74.81 | 74.90 | 0.355 | |
| 80 | 79.90 | 79.77 | 0.352 | |
| 5 | 4.98 | 0.415 | ||
| 5 | -0.03 | 0.033 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.97 | 0.051 | |
| 0.96 | 0.049 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.97 | 0.138 | |
| 0.96 | 0.137 |
| Test statistic | Rejection probability |
|---|---|
| 0.00 |
| Estimator | Parameter value | Average | Median | Standard Deviation |
|---|---|---|---|---|
| 0.67 | 0.67 | 0.67 | 0.005 | |
| 75 | 74.85 | 74.88 | 0.145 | |
| 80 | 79.72 | 79.76 | 0.137 | |
| 85 | 84.89 | 84.84 | 0.243 | |
| 70 | 69.91 | 70.00 | 0.257 | |
| 75 | 74.94 | 74.97 | 0.160 | |
| 80 | 79.76 | 79.76 | 0.090 | |
| 5 | 4.98 | 0.174 | ||
| 5 | -0.04 | 0.430 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.96 | 0.023 | |
| 0.95 | 0.022 |
| Parameter | Coverage probability | Average length |
|---|---|---|
| 0.97 | 0.060 | |
| 0.97 | 0.061 |
| Test statistic | Rejection probability |
|---|---|
| 0.00 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials
Estimation of distributional effects of treatment and control
under selection on observables:
consistency, weak convergence, and applications
Pier Luigi Conti111Pier Luigi Conti. Dipartimento di Scienze Statistiche; Sapienza Università di Roma; P.le A. Moro, 5; 00185 Roma; Italy. E-mail [email protected]
Livia De Giovanni222Livia De Giovanni. Dipartimento di Scienze Polituiche; LUISS Guido Carli; Viale Romania, 32; 00197 Roma; Italy. E-mail [email protected]
Abstract
In this paper the estimation of the distribution function for potential outcomes to receiving or not receiving a treatment is studied. The approach is based on weighting observed data on the basis on the estimated propensity score. A weighted version of the empirical process is constructed and its weak convergence to bivariate Gaussian process is established. Results for the estimation of the Average Treatment Effect (ATE) and Quantile Treatment Effect (QTE) are obtained as by-products. Applications to the construction of nonparametric tests for the treatment effect and for the stochastic dominance of the treatment over control are considered, and their finite sample properties and merits are studied via simulation.
Keywords. Potential outcomes, Propensity score, causality, Empirical processes, weak convergence, nonparametric tests, stochastic dominance.
1 Introduction
The evaluation of the possible effects of a treatment on an outcome plays a central role in theoretical as well as applied statistical and econometrical literature; cfr. the excellent review papers by [3] and [12]. The main quantity of interest, traditionally, is the average effect of the treatment on outcome, or better the difference between the expected valued of outcomes for treated and control (untreated) subjects, i.e. (Average Treatment Effect). Another quantity of interest is the effects of treatment on outcome quantiles, which is summarized by (Quantile Treatment Effect). The main source of difficulty is that data are usually observational, so that the estimation of the treatment effect by simply comparing outcomes for treated vs. control subjects is prone to a relevant source of bias: receiving a treatment is not a “purely random” event, and there could be relevant differences between treated and control subjects. This motivates the need to account for confounding covariates.
In the literature, several different techniques have been proposed to estimate , under various assumptions (see [3], [12] and references therein). As far as is concerned, cfr. the paper by [9]. The problem of evaluating possible differences in the distribution function of potential outcomes with binary instrumental variables is studied in [1] via a Kolmogorv-Smirnov type test.
In the present paper we essentially focus on evaluating the possible effects of the treatment on the whole outcome probability distribution. The starting point is to use outcome weighting similar to those introduced in [11] and [9]. Using this approach, estimates of the distribution function (d.f.) for treated and control subjects will be obtained. Such estimators essentially play a role similar to the empirical d.f. in nonparametric statistics. It will be shown that the resulting “empirical processes” weakly converge to an appropriate Gaussian process. Although it is non a Brownian bridge, it possesses several properties similar to the Brownian bridge (continuity of trajectories, etc.). These theoretical results are applied to the construction of confidence bands for the outcome distribution under treatment and under control, as well as to construct a new statistical test to compare treated and untreated subjects. In a sense, such a test is a version of the classical Wilcoxon-Mann-Whitney test for two groups comparison. Its main merit is to capture the possible difference between treated and untreated subjects even when is equal to zero. Another application of interest will be the construction of a test for stochastic dominance of treatment w.r.t. control, which is of interest, for instance, in programme evaluation exercises ([15]), welfare outcome, etc..
The paper is organized as follows. In Section 2 the problem is described. In Section 3.2 the main asymptotic large sample results are provided, and in Section 4 approximations based on subsampling are considered. Particularizations to and are given in Section 5. Section 6 is devoted to the construction of confidence bands for the d.f. of outcomes, for both treated and untreated subjects. In Section 7 a Wilcoxon-type statistic to test for treatment effect of the d.f of outcomes in introduced, and in Section 8 an elementary test for first-order stochastic dominance of treated vs. untreated is studied. The finite sample performance of the proposed methodologies is studied via Monte Carlo simulation in Section 9.
2 The problem
Let be an outcome of interest, observed on a sample of subjects. Some of the sample units are treated with an appropriate treatment (treated group); the other sample units are untreated (control group). If denotes the treatment indicator variable, then whenever , is observed; otherwise, if , is observed. Here and are the potential outcomes due to receiving and not receiving the treatment, respectively. The observed outcome is then equal to . In the sequel, will denote the distribution function (d.f.) of , and the d.f. of .
As already said in the introduction, receiving a treatment is not a “purely random” event, as in experimental framework. On the contrary, there could be relevant differences between treated and untreated subjects, due to the presence of confounding covariates. In the sequel, we will denote by the (random) vector of relevant covariates, that is assumed to be observed.
In order to get consistent estimates, identification restrictions are necessary. The relevant restriction assumed in the sequel is selection of treatment is based on observable variables: given a set of observed covariates, assignment either to the treatment group or to the control group is random. Formally speaking, let be the conditional probability of receiving the treatment given covariates ; it is termed propensity score. The marginal probability of being treated, , is equal to .
In the sequel, our main assumption is that the strong ignorability conditions (cfr. [18]) are fulfilled. In more detail, consider next the joint distribution of (), and denote by the support of . The following assumptions are assumed to hold.
- (i)
Unconfoundedness (cfr. [19]): given , are jointly independent of : (Y_{(1)},\,Y_{(0)})\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}T|X.
- (ii)
The support of , is a compact subset of .
- (iii)
Common support: there exists for which , so that , .
Assumption is also known as Conditional Independence Assumption ().
For the sake of simplicity, we will use in the sequel the notation
[TABLE]
From the above assumptions, the basic relationships
[TABLE]
are obtained.
The Average Treatment Effect (ATE) is defined as . The estimation of ATE is a problem of primary importance in the literature, and several different approaches have been proposed ([3] and references therein). Another parameter of interest in the Quantile Treatment Effect (QTE), which is the difference between quantiles of and : , with ; cfr. [9]. In particular, when it reduces to the Median Treatment Effect.
As already said in the introductory section, in the present paper we concentrate on the estimation of the d.f.s , under treatment and control, respectively. As special cases, the results in [11] and [9] will be obtained.
3 Estimation of
3.1 Basics
The basic approach to the estimation of , follows, in principle, the ideas developed in [11] to estimate ATE. First of all, the propensity score is estimated by a sieve estimator , say; cfr. [11], [9]. Let \mbox{\boldmathH}_{K}(x)=\{H_{k,j}(x)\}, be a -dimensional vector of polynomials in , such that
- S1.
\mbox{\boldmathH}_{K};\mathcal{X}\rightarrow\mathbb{R}^{K};
- S2.
;
- S3.
\mbox{\boldmathH}_{K} includes all polynomials up to order whenever , with as .
The propensity score is approximated by a linear combination of on a logit scale, with coefficients estimated by maximizing a pseudo-likelihood. More formally, if , then \widehat{p}_{n}(x)=L(\mbox{\boldmathH}_{K}(x)^{T}\widehat{\mbox{\boldmath\pi}}_{K}), where the -dimensional vector \widehat{\mbox{\boldmath\pi}}_{K} is estimated by maximum likelihood method:
[TABLE]
In the sequel, the following result will be widely used.
Theorem 1**.**
Assume that S1 - S3 are fulfilled, and that is continuously differentiable of order , with . If , with , then
[TABLE]
Proof. See [11].
Again, for notational simplicity, and similarly to , define:
[TABLE]
In order to estimate and , the following “Hájek - type” estimators are considered:
[TABLE]
where
[TABLE]
It is immediate to see that are proper d.f.s, i.e. they are bona fide estimators.
As alternative estimators of , , the following “Horvitz-Thompson - type” estimators could be considered:
[TABLE]
We will mainly concentrate on for two reasons. First of all, are not proper d.f.s, because , with positive probability. In the second place, as it will be seen in the sequel, are asymptotically equivalent to .
3.2 Basic asymptotic results
The goal of the present section is to study the asymptotic, large sample, properties of estimators . Our first result is a Glivenko - Cantelli type result, showing the uniform consistency (in probability) of , .
Proposition 1**.**
Assume that the conditions of Th. 1 are fulfilled. Then:
[TABLE]
Proof. See Appendix.
Next step consists in studying the limit, large sample distribution of the above estimators. Define first the stochastic process
[TABLE]
The bivariate stochastic process essentially plays the same role as the empirical process in classical non-parametric statistics, with a complication due to the presence of , instead of the usual empirical distribution function.
The weak convergence of can be proved similarly to the classical empirical process, with modifications. In the first place, from
[TABLE]
and from Lemma 2 , it is seen that the limiting distribution of , if it exists, coincides with the limiting distribution of
[TABLE]
In the second place, by repeating verbatim the arguments in Th. 1 in [11], and [10], with instead of and instead of , it is seen that, if , with , then the relationship
[TABLE]
holds, where
[TABLE]
The term appearing in depends on , and, as it appears by using the bounds in [10], convergence in probability to zero (or better, to the vector ) holds uniformly over compact sets of s. Hence, in order to prove that the sequence of stochastic processes converges weakly to a limit process, it is enough to prove that converges weakly to a limiting process.
Proposition 2**.**
Assume that the conditions of Th. 1 are fulfilled, and that , , , are continuous. Then, the sequence of stochastic processes converges weakly, as goes to infinity, to a Gaussian process with null mean function (, ) and covariance kernel:
[TABLE]
where:
[TABLE]
Weak convergence takes place in the set of bounded functions equipped with the sup-norm (if ) .
Proof. See Appendix.
Due to the continuity of , , the weak convergence of Proposition 2 also holds in the space of -valued càdlàg functions equipped with the Skorokhod topology.
Consider now the Horvitz-Thompson estimators , and define:
[TABLE]
From the proof of Proposition 2, it appears that the sequence of stochastic processes converges weakly to the same Gaussian limiting process that appears in Proposition 2. Hence, the Horvitz-Thompson estimators are asymptotically equivalent to the Hájek estimators .
As well known, in classical nonparametric statistics the empirical process converges weakly to a Brownian bridge, on the scale of the population ditribution function. The limiting process in Proposition 2 is not a Browinian bridge, of course, although it is a Gaussian process. However, it shares with the Brownian bridge an important property: it possesses trajectories that are a.s. continuous.
Proposition 3**.**
If and are continuous, the limiting process possesses trajectories that are continuous with probability 1.
Proof. See Appendix.
3.3 Differentiable functionals
The result of Proposition 2 can be immediately extended to general Hadamard differentiable functionals of , again assuming the continuity of , . Consider a general functional:
[TABLE]
where is equipped with the -norm metric and is a normed space equipped with a norm . As seen in Proposition 3, the limiting process concentrates on , where is the set of continuous functions on the extended real line . Note that functions in are bounded.
The functional is Hadamard differantiable at tangentially to if there exists a linear application
[TABLE]
such that:
[TABLE]
Using Theorem 20.8 in [20], we then have:
[TABLE]
In general, since is a linear functional of a Gaussian process, it is a Gaussian process, as well. In particular, if is a real-valued functional, then has a Gaussian distribution with zero expectation and variance
[TABLE]
For the sake of simplicity, let be equal to . The above result can be rewritten as
[TABLE]
where the asymptotic variance is given by .
4 Subsampling approximation
Consider a functional . In order to construct a confidence interval on the basis of , a consistent estimate of the asymptotic variance is necessary. Unfortunately, apart a few cases, this is not simple, because could depend on , in a complicate way, and a direct estimation could not be possible. This is the case, for instance, of quantiles, that will be dealt with in next section. Here we briefly present a simple approach based on subsampling.
Define , , and consider all the subsamples of size of . Let further be the statistic computed for the -th subsample of size . Next, consider then the empirical distribution function of the quantities . In symbols:
[TABLE]
If:
- U1.
;
- U2.
depends on in such a way that , ;
then, using Th. 2.1 in [17], we have
[TABLE]
where is the distribution function of the Gaussian distribution. The convergence in (31) is uniform in .
Relationship tells us that can be (uniformly) approximated by , as and get large. From the continuity and strict monotonicity of , it follows that the empirical quantile converges in probability to the quantile of order of the distribution .
The number of subsamples of size , in can be very large, and then could be difficult to be computed. In this case a “stochastic” version of can be considered according to the following steps.
Select independent subsamples of size from .
- 2.
Compute the corresponding values of the statistic .
- 3.
Compute of the corresponding empirical distribution function:
[TABLE]
It can be easily verified that if , and , then has the same limiting behaviour as . These results can be used to obtain confidence intervals for and for testing statistical hypotheses via inversion of confidence intervals. In more detail, let
[TABLE]
be the th quantile of . It is easy to show that the interval:
[TABLE]
is confidence interval for of asymptotic level .
The confidence interval can be also used for testing the hypothesis:
[TABLE]
If is in the confidence interval, then is accepted, otherwise it is rejected. Clearly, this is a test of asymptotic significance level .
5 Average and Quantile Treatment Effect
The results obtained so far allow one to re-obtain, as special cases, results previously obtained by [11] and [9]. They are presented below.
5.1 Average Treatment Effect
The Average Treatment Effect (ATE, for short) is defined as:
[TABLE]
In the sequel, we will assume that and are both finite. As an estimator of , consider
[TABLE]
where the weights , are given by .
As it appears from , is a linear functional of and hence Hadamard differentiable. An integration by parts shows that the asymptotic distribution of coincides with that
[TABLE]
that turns out to normal with zero mean and variance
[TABLE]
It is not difficult to see that the estimator is asymptotically equivalent to that introduced in [11].
5.2 Quantiles and Quantile Treatment Effect
Let , be the quantile of order of , . In the sequel, we will assume that , are in the common support of , . Furthermore, we will denote by the support of , .
Suppose that , are continuous with positive density functions , , respectively:
[TABLE]
As a consequence of the above assumption, is strictly monotonic (in its support).
Consider now () such that lie in the common support of , . It is intuitive to estimate the quantile by its “empirical counterpart”
[TABLE]
Let now be the set of the restrictions of the distribution functions in to , and let be the set of càdlàg functions in . From [20], it is seen that the map (from onto is Hadamard differentiable at tangentially to with derivative:
[TABLE]
Using then Th. 20.8 in [20], (cfr. [7] for an equivalent approach), the process
[TABLE]
converges weakly as (on equipped with the -norm) to a Gaussian process defined as:
[TABLE]
The stochastic process is a Gaussian process with zero mean function and covariance kernel:
[TABLE]
Note that due to the symmetry of the Gaussian distribution.
In [9] the difference between corresponding quantiles:
[TABLE]
is considered. It is known as Quantile Treatment Effect (QTE, for short). From it is intuitive to estimate by
[TABLE]
The estimator is asymptotically equivalent to the estimator of QTE defined in [9]. In fact, from it appears that
[TABLE]
tends in distribution, as goes to infinity, to a Gaussian distribution with zero mean and variance:
[TABLE]
which coincides with the asymptotic variance of the estimator of QTE used in [9].
6 Confidence bands for and
The aim of the present section is to construct a confidence bandwidth for , , assuming again that they are continuous d.f.s.. As seen in Proposition 3, under this assumption the process has a.s. continuous trajectories. Furthermore:
[TABLE]
In other words, the trajectories of are continuous and bounded with probability 1. From now on, we will also assume that the cross-covariance matrix C(y,\,t)=E\bigl{[}W(y)\otimes W(t)\bigr{]} is such that is a positive-definite matrix, for every real . Under these conditions it is possible to show ([14]) that the functional: can only have an atom at the point
[TABLE]
and has absolutely continuous distribution on . On the other hand, only when , and, from Th. 8.1 in [8] it follows that has absolutely continuous distribution in , for every positive . Hence
[TABLE]
which proves that the distribution of has no atom at [math]. In other terms, has absolutely continuous distribution on .
The starting point to construct a confidence band of asymptotic level for consists in considering the Kolmogorov statistic:
[TABLE]
From Propositions 2, 3, we obtain
[TABLE]
Let be the quantile of the distribution of . As a consequence of the absolute continuity of , there is a unique satisfying:
[TABLE]
The quantile depends on unknown quantities. It can be estimated by subsampling. Using the notation introduced in Section 4, define
[TABLE]
The subsampling procedure can be shortly described as follows.
Select independent subsamples of size from . 2. 2.
Compute the values:
[TABLE]
. 3. 3.
Compute the empirical distribution function:
[TABLE] 4. 4.
Compute the quantile:
[TABLE]
Now, it is easy to see that:
[TABLE]
From the absolute continuity of the distribution of , it also follows that:
[TABLE]
tends in probability to . In symbols:
[TABLE]
Finally, from we may conclude that
[TABLE]
so that the region
[TABLE]
is a confidence bandwidth for of asymptotic level .
7 Testing for the presence of a treatment effect: two (sub)sample Wilcoxon test
7.1 Wilcoxon type statistic
In nonparametric statistics, a problem of considerable relevance consists in testing for the possible difference between two samples. Among several proposals, the two-sample Wilcoxon (or Wilcoxon-Mann-Whitney) test plays a central role in applications, mainly because of its properties. The goal of the present section is to propose a Wilcoxon type statistic to test for the possible difference between the (sub)sample of treated subjects and the (sub)sample of untreated subjects. In other terms, we aim at developing a Wilcoxon type statistic to test for the possible difference between treated and untreated subjects, i.e. for the possible presence of a treatment effect.
From now on, we will assume and are both continuous. As in the classical Wilcoxon two-sample test, in order to measure the difference between the distributions of and , we consider
[TABLE]
The parameter possesses a natural interpretation, because it is equal to the probability that a treated subject possesses a -value greater than the -value for an independent, untreated subject. A few properties of are listed below.
depends only on the marginal d.f.s , (not on the way , are associated in the same subject). 2. 2)
If then ; 3. 3)
Using is equivalent to use , as it it seen by an integration by parts. 4. 4)
If , i.e. if is stochastically larger than , then:
[TABLE]
The Wilcoxon type statistic we consider here is obtained in two steps, essentially by a plug-in approach.
- Step 1.
Estimation of the marginal d.f.s , :
[TABLE] 2. Step 2.
Estimation of :
[TABLE]
Note that if and only if (iff) , i.e. iff is treated and is untreated. This essentially shows that is based on the comparison treated/untreated.
The limiting distribution of the statistic is obtained as a consequence of Proposition 2.
Proposition 4**.**
Assume that the conditions of Proposition 2 are fulfilled. Then
[TABLE]
where
[TABLE]
and
[TABLE]
Proof. See Appendix.
7.2 Variance estimation
The asymptotic variance appearing in contains unknown terms, that can be consistently estimated on the basis of sample data. In particular, the estimation of can be simply developed by considering the regression of
[TABLE]
on , , and to estimate the regression function by a method ensuring consistency (e.g. local polynomials, Nadaraya-Watson kernel regression, spline). The resulting estimator is uniformly consistent on compact sets of s under few regularity conditions. In the same way, can be consistently estimated by , say. As a consequence the term can be estimated by:
[TABLE]
Note that as an alternative estimator, one could consider:
[TABLE]
In the second place, we have to estimate
[TABLE]
The term can be estimated with
[TABLE]
The term:
[TABLE]
can be estimated by means of a non parametric regression of:
[TABLE]
with respect to s. The resulting estimator , say, is consistent under few conditions. In the same way, an estimator of
[TABLE]
can be obtained.
The asymptotic variance of can be finally estimated by:
[TABLE]
7.3 Testing the equality of and via Wilcoxon type statistic
A test for the equality of and can be constructed via the statistic . As already seen, when and coincide, is equal to . Hence, the idea is to construct a test for the hypotheses problem
[TABLE]
On the basis of Proposition 4, and the variance estimator , the region
[TABLE]
(where is the quantile of the standard Normal distribution) is an acceptance region of asymptotic significance level .
Alternatively, one could approximate the quantiles of the distribution of by subsampling, as outlined in Section 4. Using the notation introduced for subsampling, it is seen that the acceptance region
[TABLE]
is a confidence interval for of asymptotic level . Hence, the test consisting in rejecting whenever the interval does not contain , possesses asymptotic significance level .
8 Testing for stochastic dominance
In evaluating the effect of a treatment, it is sometimes of interest to test wether the treatment itself has an effect on the whole distribution function of , i.e. wether the treatment improves the behaviour of the whole d.f. of . Various forms of stochastic dominance are discussed in [16], [2]. In particular, in the present section we will focus on testing for first-order stochastic dominance. The d.f. first-order stochastically dominates if . Our main goal is to construct a test for the (uni-directional) hypotheses
[TABLE]
where .
In econometrics and statistics, there is an extensive amount of literature on testing for stochastic dominance, since the papers by [2], [5]. In [15] a Kolmogorov-Smirnov type test is proposed, and a method to construct critical values based on subsampling is proposed. For further bibliographic reference, and a deep analysis of contributions to testing for stochastic dominance, cfr. the recent paper by [6].
In the present paper, we confine ourselves to a simple, intuitive procedure to test for uni-directional dominance. A simple idea to construct a test for the above hypotheses problem is to invert a confidence region for . The null hypothesis is rejected whenever the confidence region has empty intersection with . More formally, the test procedure we consider here is defined as follows.
- (i)
Compute a confidence region for of (at least asymptotic) level ;
- (ii)
Reject if the confidence region for and are disjoint, that is if for at least a real the region has lower bound greater than zero.
From now on, we will assume that both , are continuous d.f.s. Using the arguments in Section 6, it is possible to see that the r.v.
[TABLE]
has absolutely continuous distribution, with . Hence, there exists a single such that
[TABLE]
The quantile can be estimated by subsampling, as outlined in Section 4. Define
[TABLE]
A subsampling procedure to estimate is described below.
Select independent subsamples of size from the sample of s, . 2. 2.
Compute the subsample statistics
[TABLE] 3. 3.
Compute the corresponding empirical d.f.
[TABLE] 4. 4.
Compute the corresponding quantile
[TABLE]
The arguments in Section 6 show that
[TABLE]
Hence, the asymptotically exact approximation
[TABLE]
holds. As a consequence, the region
[TABLE]
is a confidence region for with asymptotic level . The null hypothesis is rejected whenever:
[TABLE]
The performance of the testing procedure developed so far will be evaluated by simulation in Section 9.
9 A simulation study
The goal of the present section is to study by simulation the performance of the proposed methods for finite sample sizes. In particular, estimation of s and related hypotheses tests are studied under two scenarios: there is no treatment effect, i.e. coincides with ; there is treatment effect, i.e. .
replications with samples sizes and have been generated by Monte Carlo simulation. The propensity score has been estimated via the estimator considered in Th. 1; the term has been chosen through least squares cross-validation. As far as subsample approximation is concerned, subsamples of size () have been drawn by simple random sampling from each of the original samples of size (.
In scenario (absence of treatment effect) the potential outcome is specified as
[TABLE]
where has a Bernoulli distribution with success probability ( and has a uniform distribution in the interval (). The r.v.s , are mutually independent. Clearly, , , and .
The exact distribution function of is
[TABLE]
The d.f. , and the corresponding density function , are depicted in Fig. 1.
The propensity score, in this case, is
[TABLE]
Furthermore we have and , so that even if . This is clearly due to the confounding effect of .
In Table 1 (, ) and Table 5 (, ) average, median and standard deviation of , of , , and , are reported. The quantities are also reported for the estimator of and for the “naive” mean difference between treated and untreated i.e. .
In Tables 2-4 (, , , ) and Tables 6-8 (, , , ) the 95% coverage probability and average length of confidence intervals for the Wilcoxon-type statistic obtained via sampling and subsampling and for confidence bands for and the percentage of rejection of the null hypothesis for the test of stochastic dominance are reported.
The results indicate that the Wilcoxon type statistic and the estimated quantiles perform well according to unbiasedness and dispersion. The sampling standard error of the Wilcoxon type statistic tends to be close to its theoretical one. The estimated ATE is equal to its “true value” (Tables 1 and 5). The coverage probabilities of the confidence intervals are close to the nominal level 95% (Tables 2-3 and 6-7). Finally, the percentage of rejection of the null hypothesis for the test of stochastic dominance is close to 0.05, being true the null hypothesis of no treatment effect in scenario i.e. (Tables 4 and 8).
In scenario (presence of treatment effect), the potential outcome is specified as in with . The potential outcome is specified as
[TABLE]
where has a Bernoulli distribution and , have a Uniform distribution . The r.v.s , , are mutually independent.
The exact distribution function of is reported below
[TABLE]
and depicted in Fig. 2.
In scenario , we have , , , and then . Furthermore, stochastically dominates .
The propensity score is
[TABLE]
so that and even if . As in scenario , this is due to the confounding effect of .
In Table 9 (, ) and Table 13 (, ) average, median and standard deviation of , of , , and , are reported. The quantities are also reported for the estimator of and for the “naive” mean difference between treated and untreated i.e. .
In Tables 10-12 (, , , ) and Tables 14-16 (, , , ) the 95% coverage probability and average length of confidence intervals for the Wilcoxon-type statistic obtained via sampling and subsampling and for confidence bands for and the percentage of rejection of the null hypothesis for the test of stochastic dominance are reported.
The results indicate that the Wilcoxon type statistic and the estimated quantiles perform well according to unbiasedness and dispersion. The sampling standard error of the Wilcoxon type statistic tends to be close to its theoretical one. The estimated ATE is equal to its “true value” (Tables 9 and 13). The coverage probabilities of the confidence intervals are close to the nominal level 95% (Tables 10-11 and 14-15). Finally, the percentage of rejection of the null hypothesis for the test of stochastic dominance is close to 0.05, being true the null hypothesis of no treatment effect. As in scenario i.e. stochastically dominates the rejection probability is smaller than in in scenario (Tables 12 and 16).
Appendix - Technical Lemmas and proofs
Lemma 1**.**
Under the assumptions of Th. 1:
[TABLE]
Proof of Lemma 1.
Take an arbitrary . Since , , we may write
[TABLE]
Since holds for every positive small enough, the lemma is proved as . The case is similar. ∎
Lemma 2**.**
Under the assumptions of Th. 1:
[TABLE]
Proof of Lemma 2.
Consider the case . First of all, we have
[TABLE]
Next, by Lemma 1 it is easy to see that
[TABLE]
Furthermore, from the Strong Law of Large Numbers for sequences of i.i.d. r.v.s it is seen that
[TABLE]
as . From and the first convergence in follows. Convergence in the case is proved in a similar way. ∎
Lemma 3**.**
Consider the “pseudo-estimator” of :
[TABLE]
Under the assumptions of Th. 1:
[TABLE]
Proof of Lemma 3.
Consider first the case . From
[TABLE]
it is seen that
[TABLE]
Proof immediately follows from Lemmas 2, 3. The case is similar. ∎
Lemma 4**.**
Consider again the “pseudo-estimators” . Under the assumptions of Lemma 4:
[TABLE]
Proof of Lemma 4.
The result can be shown by standard arguments. Consider first the case . From the Strong Law of Large Numbers for i.i.d. r.v.s, we have:
[TABLE]
Moreover, on the basis of the properties of (monotone non decreasing, continuous to the left, with total variation equal to 1), for every positive there exists a partition of
[TABLE]
such that
[TABLE]
For each it is then:
[TABLE]
for all , and this implies that
[TABLE]
Moreover, for every it is seen that
[TABLE]
and similarly:
[TABLE]
From inequalities , it follows that
[TABLE]
As , the Strong Law of Large Numbers implies that
[TABLE]
and since can be made arbitrarily small, conclusion follows. The case is dealt with similarly. ∎
Proof of Proposition 1.
Immediate consequence of Lemmas 3, 4. ∎
Proof of Proposition 2.
Using and the uniform boundedness on compact sets of s of the term, it is enough to prove that the sequence of stochastic processes
[TABLE]
converges weakly to the Gaussian process . Observing that , and using Theorem 2.11.1 in [21] (p. 206), we have to prove point-wise convergence of covariance functions and asymptotic equicontinuity.
1. Convergence of covariance. Consider first the term . Since s are i.i.d. r.v.s, and taking into account that , we may write
[TABLE]
and similarly
[TABLE]
As far as the cross-covariance terms are concerned, it is immediate to see that . Furthermore:
[TABLE]
and this ends the “covariance part” of the proof.
2. Asymptotic equicontinuity. Consider the i.i.d. r.v.s , and suppose . Then:
[TABLE]
A similar result is obtained as , as well as when , so that inequalities:
[TABLE]
hold true.
Since is continuous (uniformly, being monotonic and bounded), from
[TABLE]
it follows that, for every positive :
[TABLE]
Next, define the (random) pseudometric:
[TABLE]
From the Strong Law of Large Numbers it is seen that
[TABLE]
with .
Denote now by the smallest number of intervals of that cover the real line, and such that . By (98) it follows that, with probability 1, for large enough,
[TABLE]
Hence, with probability 1, the number is bounded by , being an appropriate constant. As a consequence, with probability 1, for large enough, we have:
[TABLE]
In view of Theorem 2.11.1 in [21] (p. 206), this completes the proof. ∎
Proof of Proposition 3.
Let , . Then, possesses continuous trajectories almost surely if possesses continuous trajectories almost surely. From the inequality (consequence of of proof of Proposition :
[TABLE]
being an appropriate constant, it follows that
[TABLE]
The continuity of the trajectories of follows from and formula in [13]. ∎
Proof of Proposition 4.
First of all, using an integration by parts we have
[TABLE]
and hence
[TABLE]
where , , [math].
Now, if are continuous, the limiting process possesses trajectories that are continuous (and bounded) with probability 1, so that it is concentrated on , that is separable and complete if equipped with the -norm. Using then the Skorokhod Representation Theorem (cfr. [4], p. 70), there exist processes , , and , defined on a probability space such that
[TABLE]
and
[TABLE]
where the symbol denotes equality in distribution.
From and , the relationship
[TABLE]
follows.
The terms appearing in the r.h.s. of can be handled separately. First of all, we have
[TABLE]
and since
[TABLE]
we easily obtain
[TABLE]
and similarly
[TABLE]
Finally, for every integer , is a bounded variation function, with total variation , a.s.-, and since the trajectories of the process are continuous and bounded we may write
[TABLE]
Relationship the signed measure induced by converges weakly to a measure identically equal to zero. Hence:
[TABLE]
where the term goes to zero according to the Helly-Bray theorem ( is continuous and bounded a.s. ), and the term goes to zero according to the Skorokhod Representation Theorem.
From , , and it follows that:
[TABLE]
which is equivalent to:
[TABLE]
The r.h.s. of is a linear functional of a Gaussian process with continuous and bounded trajectories, so that it possesses Gaussian distribution with zero expectation and variance
[TABLE]
where
[TABLE]
The terms - in - can be written more compactly. Using the quantities , defined in , it is not difficult to see that
[TABLE]
In the same way, it is seen that:
[TABLE]
and
[TABLE]
From - , easily follows. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abadie, A. (2002). Bootstrap tests for distributional treatment effects in instrumental variable models. Journal of the American Statistical Association , 97 (457), 284-292.
- 2[2] Anderson, G. (1996). Nonparametric Tests of Stochastic Dominance in Income Distribution. Econometrica , 64 , 1183-1193.
- 3[3] Athey, S. and Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives , 31 (2), 3-32.
- 4[4] Billingsley, P. (1999). Convergence of Probability Measures 2nd Ed. Wiley, New York.
- 5[5] Davidson, R. S. and Duclos, J. Y. (2000). Statistical inference for stochastic dominance and for the measurement of poverty and inequality. Econometrica , 68 , 1435-1464.
- 6[6] Donald, S. G. and Hsu, Y. C. (2016). Improving the Power of Tests of Stochastic Dominance. Econometric Reviews , 35 , 553-58.
- 7[7] Doss, H. and Gill, R. D. (1992). An elementary approach to weak convergence for quantile processes, with applications to censored survival data. Journal of the American Statistical Association , 87 , 869-877.
- 8[8] Dudley, R. M. (1973). Sample Functions of the Gaussian Process. The Annals of Probability , 1 , 66-103.
