Quantile Treatment Effects in Regression Kink Designs
Heng Chen, Harold D. Chiang, Yuya Sasaki

TL;DR
This paper establishes the first identification results for quantile treatment effects of binary treatments within regression kink designs, along with large sample inference methods and practical guidelines.
Contribution
It fills a gap by providing identification and inference methods for quantile effects of binary treatments in regression kink designs, which was previously unaddressed.
Findings
Identification of quantile treatment effects for binary treatments in regression kink designs.
Development of large sample inference theories.
Provision of practical estimation and inference guidelines.
Abstract
The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we fill this void in the literature by providing an identification of quantile treatment effects in regression kink designs with binary treatment variables. For completeness, we also develop large sample theories for statistical inference and a practical guideline on estimation and inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Quantile Treatment Effects in Regression Kink Designs††thanks: First arXiv date: March 15, 2017.
Heng Chen
Bank of Canada Heng Chen: [email protected]. Currency Department, Bank of Canada, 234 Wellington Street, Ottawa, ON, K1A 0G9, Canada.
Harold D. Chiang
Vanderbilt Harold D. Chiang: [email protected]. Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA.
Yuya Sasaki
Vanderbilt Yuya Sasaki: [email protected]. Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA.
We thank Yingying Dong, Robert Moffitt, and participants at New York Camp Econometrics XIII for useful comments. All the remaining errors are ours.
Abstract
The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we fill this void in the literature by providing an identification of quantile treatment effects in regression kink designs with binary treatment variables. For completeness, we also develop large sample theories for statistical inference and a practical guideline on estimation and inference.
Keywords: causal interpretation, identification, quantile treatment effects, regression kink design.
1 Introduction
Theories of identification in regression kink designs are advanced by a few papers in the recent literature. Card, Lee, Pei, and Weber (2015) propose identification of average effects of continuous treatments. Dong (2018) proposes identification of average effects of binary treatments. Chiang and Sasaki (2019) propose identification of quantile-wise effects of continuous treatments. To date, no theory has been proposed for identification of quantile-wise effects of binary treatments in regression kink designs. This paper aims to fill this void in the literature.
Specifically, in regression kink designs with binary treatments, we show that a local Wald ratio of derivatives of certain conditional expectation functions can be used to identify the conditional distribution functions of the potential outcomes given the event of local compliance. These conditional distribution functions can be used in turn to identify the quantile treatment effects given the event of local compliance. Our identification argument parallels that of Frandsen, Frölich, and Melly (2012), who show that a local Wald ratio of certain conditional expectation functions can be used to identify the conditional distribution functions of potential outcomes given the event of local compliance in the context of regression discontinuity designs. Because of the lack of discontinuity in our context of regression kink designs, however, our identification result entails the limit case of the event of local compliance, which amounts to the subpopulation to which the marginal treatment effects (Björklund and Moffitt, 1987; Heckman and Vytlacil, 1999; Heckman, and Vytlacil, 2005) are relevant. This is analogous to, and provides a quantile counterpart of the identification result by Dong (2018).
Our identifying formula takes a form of local Wald ratios of derivatives of functions. Such a form is related to the identifying formulas of several papers in the existing literature. These papers include Dong and Lewbel (2015) – also see Cerulli, Dong, Lewbel, and Poulsen (2017) – who use a local Wald ratio of derivatives of conditional expectation functions to identify the average effect of changing the threshold location in regression discontinuity designs, Card, Lee, Pei, and Weber (2015) who use a local Wald ratio of derivatives of conditional expectation functions to identify average effects of continuous treatments in regression kink designs, Dong (2018) who use a local Wald ratio of derivatives of conditional expectation functions to identify average effects of binary treatments in regression kink designs, and Chiang and Sasaki (2019) who use a local Wald ratio of derivatives of conditional quantile functions to identify quantile-wise effects of continuous treatments in regression kink designs. Differently from each of these papers, we use the difference of left-inverses of two local Wald ratios of derivatives of conditional expectation functions to identify quantile-wise effects of binary treatments in regression kink designs.
While we motivate this paper by quantile treatment effects, the identifying formulas we provide as the main result of this paper can be also used to identity the distributional treatment effects. Therefore, this paper also relates to Abadie (2002) who uses a form of Wald ratios to identify distributional treatment effects, and more closely relates to Shen and Zhang (2016) who consider distributional treatment effects in the context of regression discontinuity designs.
In addition to the main identification result, we also provide methods of estimation and inference for quantile treatment effects based on analog estimators of our identifying formulas. While our identification result is novel, estimation and inference results follow from an adaptation of existing approaches to our framework. Therefore, the main text focuses on the identification theory. Details of estimation and inference theories are found in the appendix.
The rest of this paper is organized as follows. In Section 2, we develop the identification result. Section 3 presents a practical guideline on estimation and inference. Appendix A presents formal theories for the method of inference. Appendix B presents additional practical considerations. Appendix C contains mathematical details.
2 Identification: the Main Result
We model the random vector through the following causal structure, where , , , for , and .
[TABLE]
In equation (2.1), the outcome variable is produced through function by a binary treatment variable , a continuous running variable or assignment variable , and miscellaneous factors . We let denote the potential outcome random variable that an individual with attributes would produce under each hypothetical treatment choice . The actual treatment choice is determined by and through the threshold-crossing model (2.2). A researcher observes the joint distribution of , , and . However, a researcher cannot observe or . We do not impose any statistical independence condition in this model. Therefore, existing methods for instrumental variable quantile regression (e.g., Chernozhukov and Hansen, 2005) will not apply here. In particular, we do not assume statistical independence between the running variable and the unobservables . Instead, we make the following assumption of the regression kink design (RKD).
Assumption 1** (Regression Kink Design, RKD).**
*Let be a designed kink location.
(i) is continuously differentiable in a deleted neighborhood of .
(ii) is continuous at .
(iii) , where denotes .
(iv) The conditional distribution of given is absolutely continuous with a continuously differentiable conditional density function .
(v) The conditional cumulative distribution function is continuously differentiable for each for each .
(vi) .*
The research design as required by Assumption 1 consists of three broad pieces. First, the treatment assignment rule has a kink at the designed location , as formally stated in parts (ii) and (iii), but this assignment rule is reasonably smooth elsewhere, as formally stated in part (i). Second, every other function is reasonably smooth, as formally stated in parts (iv) and (v). Third, there is sufficient data at the designed kink location , as formally stated in part (vi). This assumption is analogous to that of Dong (2018) who analyzes average effects of binary treatments in the regression kink design. Under this design, we obtain the following identification result for conditional distributions of the potential outcomes given the event of .
Theorem 1** (Identification).**
Let Assumption 1 hold for the model (2.1)–(2.2). Then,
[TABLE]
hold for all .
Once the conditional cumulative distribution functions, for , are identified through the formulas presented in Theorem 1, the conditional quantile treatment effect is in turn identified by
[TABLE]
for . Theorem 1 also provides the identification of the distributional treatment effects for local complies, , as in Abadie (2002) and Shen and Zhang (2016), which are useful to test important hypotheses such as the first order stochastic dominance.111We remark that, with our identifying formulas provided in Theorem 1, can be simply expressed as a single Wald ratio: .
Proof of Theorem 1: By applying Leibniz rule under Assumption 1 (i) and (iv), we have
[TABLE]
for all . Similarly, by applying Leibniz rule under Assumption 1 (i), (iv), and (v), we have
[TABLE]
for all . Therefore, by Assumption 1 (ii) and (iv), we can write
[TABLE]
and, by Assumption 1 (ii), (iv), and (v), we can write
[TABLE]
for all . Taking the ratio of these expressions under Assumption 1 (iii) and (vi) yields
[TABLE]
for all . Similar lines of arguments yield
[TABLE]
for all . ∎
Discussions of Theorem 1: In the context of the regression discontinuity design (RDD) where , Frandsen, Frölich, and Melly (2012) show that similar local Wald ratios identify the conditional distribution of the potential outcomes given the event
[TABLE]
of local compliance. In our context of the regression kink design where , Theorem 1 shows that local Wald ratios of the derivatives identify the conditional distributions of the potential outcomes given the event
[TABLE]
which may be considered as a limit of the event for RDD as approaches [math]. In this sense, our causal interpretation result is similar to that of the marginal treatment effects (Björklund and Moffitt, 1987; Heckman and Vytlacil, 1999; Heckman, and Vytlacil, 2005). This interpretation is analogous to the identification result by Dong (2018) who analyzes average effects of binary treatments in the regression kink design.
3 Estimation and Inference: a Practical Guideline
While the main contribution of this paper lies in our new identification result presented in Section 2, we also develop a theory and method of estimation and inference for completeness. Since the estimation and inference strategies are standard, we relegate most of the details to the appendix. In this section, we present a practical guideline on estimation and inference for the conditional quantile treatment effects . A formal theory is presented in Appendix A. We also present additional practical considerations in Appendix B. Auxiliary lemmas and proofs are found Appendix C.
The local Wald ratios proposed in Theorem 1 as identifying formulae can be succinctly rewritten as
[TABLE]
where and are the partial derivatives with respect to of and defined by
[TABLE]
respectively. We estimate the components of (3.1) by the one-sided local cubic estimators
[TABLE]
where is a kernel function, is a bandwidth parameter, , , and . A plug-in estimator for (3.1) is given by
[TABLE]
The motivation for our using the local cubic polynomial is to account for the manual bias correction from local quadratic estimators. By considering the asymptotic distribution for the higher-order local polynomial, we effectively account for bias estimation in the asymptotic distribution from the lower-order one, thus allowing for robustness in inference against large bandwidths – see Calonico, Cattaneo and Titiunik (2014, Remark 7) and Remark S.A.7 in their supplementary material.
We can in turn estimate the conditional quantile treatment effect by
[TABLE]
The local Wald estimator is not always monotone increasing in finite sample. For ease of implementing the CDF inversion, we monotonize the estimated CDFs by re-arrangements following Chernozhukov, Fernández-Val, Galichon (2010). This does not affect the asymptotic properties of the estimators, while allowing for inversion of the CDF estimators. Frandsen, Frölich, and Melly (2012) also use this technology in the context of the regression discontinuity design.
Let . Under the assumptions to be stated in Appendix A, we obtain the following Uniform Bahadur Representations (BR) for the local slope estimators (3.2) and (3.3).
[TABLE]
We note that are trivial functions of .
Covariance functions for the limit processes are often cumbersome to approximate in practice. Qu and Yoon (2018) propose a simulation method to approximate limit processes under sharp designs – also see Qu and Yoon (2015) – but this method is not applicable to fuzzy designs. We thus propose to use the multiplier bootstrap method to approximate the asymptotic distributions of these BR. Draw a random sample from the standard normal distribution independently from the data . Replacing the unknowns and in the BR by their uniformly consistent estimators and , respectively, we define the following Estimated Multiplier Processes (EMP).
[TABLE]
Under the assumptions to be stated in Appendix A, we show that the EMP can be used to uniformly approximate the asymptotic distribution of the BR. Consequently, by the functional delta method, the asymptotic distribution of
[TABLE]
can be approximated uniformly on for by the estimated process
[TABLE]
where
[TABLE]
Once we obtain these approximations to the asymptotic distributions, we may conduct various tests of quantile functions following Koenker and Xiao (2002) and Chernozhukov and Fernández-Val (2005). For example for the test of treatment significance, we use the test statistic
[TABLE]
where for some . We can approximate the asymptotic distribution of by
[TABLE]
Similarly, for the test of treatment homogeneity, we use the test statistic
[TABLE]
We can approximate the asymptotic distribution of by
[TABLE]
In this section, we presented a practical guideline on estimation and inference for the conditional quantile treatment effects . We refer interested readers to Appendix A for a formal theory. Furthermore, Appendix B presents additional practical considerations not covered in this section.
4 Summary
The existing literature on identification in regression kink designs includes the following three results. Card, Lee, Pei, and Weber (2015) propose identification of average effects of continuous treatments. Dong (2018) proposes identification of average effects of binary treatments. Chiang and Sasaki (2019) propose identification of quantile-wise effects of continuous treatments. On the other hand, this literature has been missing an identification result for quantile-wise effects of binary treatments. To complete this literature on identification, we propose identification of quantile-wise effects of binary treatments in this paper in regression kink designs.
Specifically, we show that a local Wald ratio of derivatives of certain conditional expectation functions identifies the conditional distribution functions of potential outcomes given the event of local compliance. Taking the difference of the left-inverses of these identified conditional distribution functions in turn identifies the conditional quantile treatment effects given the event of local compliance. While the main contribution of this paper is the identification result, we also develop a theory and method of estimation and inference for completeness.
Mathematical Appendix
Appendix A Estimation and Inference: Formal Theory
We use the following set of assumptions for the uniform Bahadur Representations, the bootstrap validity, and consistent conditional density and first-stage estimations. Fix and , denote
[TABLE]
We will write if there exists a universal constant such that . Denote
[TABLE]
We define the following objects for all , , ,
[TABLE]
Assumption 2**.**
*Let be a compact interval containing [math] in its interior. Let .
(i) (a) are independent copies of random vector with support defined on a probability space . (b) has a continuously differentiable density function with . (c) is well-defined on and on .
(ii)(a) Conditional density is Lipschitz continuous on and for each and is four-time partially differentiable in and twice partially differentiable in for each . is continuous and uniformly bounded on and for each for all , , . (b) is Lipschitz continuous in , four-time differentiable on and for each . is continuous and uniformly bounded on and for each d. (c) For any , , , , we have , and where is the collection of continuously differentiable functions.
(iii) The bandwidths satisfy , , , for some finite .
(iv) (a) is bounded and . (b) is of VC type. (c) are positive definite.
(v) is a consistent estimator for . For , are uniformly consistent estimators for . and are uniformly consistent estimators for and on .
(vi) are independent and identically distributed copies of a standard normal random variable defined on a probability space that is independent of .*
Part (i) concerns about the sampling procedure and the distribution of data. Part (ii) requires smoothness of the conditional expectation functions on a deleted neighborhood of . Part (iii) regulates the rate at which bandwidth decreases, which is consistent with examples of common choice rules to be presented in Appendix B.3. For example, the MSE-optimal bandwidth for the local quadratic estimator (e.g., ) is allowed. Part (iv) is satisfied by common kernel functions, such as uniform, triangular, biweight, triweight, and Epanechnikov kernels, for example. Part (v) is a high-level assumption of (uniformly) consistent estimation of the first-stage estimators. While we keep this high-level statement for the current section, Appendix B.2 proposes concrete examples of such uniformly consistent estimators. Part (vi) requires the multiplier random sample to be drawn independently of the data . We remark that part (vi) implies that all (uniformly) consistent estimators with respect to are also (uniformly) consistent with respect to .
Under Assumption 2 (i), (ii)(a)(b), (iii), (iv), an application of Lemma 1 of Chiang, Hsu, and Sasaki (2019) gives the uniform Bahadur Representation as in equations (3.4) and (3.5). The following theorem establishes (i) (a) the asymptotic distribution of the BR; (i) (b) the asymptotic distribution of the local Wald estimators; (i) (c) the asymptotic distribution of the conditional quantile treatment effect estimator; and (ii) the bootstrap validity. A proof is provided in Appendix C.2.
Theorem 2** (Asymptotic Distributions and Bootstrap Validity).**
*Suppose Assumptions 1 and 2 hold, then there exists a zero mean Gaussian process , where is the collection of all bounded real valued functions, such that:
(i) (a) .
(i) (b) holds, where is given by*
[TABLE]
(i) (c) holds, where is given for each by
[TABLE]
(ii) We have
[TABLE]
Remark 1**.**
By considering the asymptotic distribution for the local cubic local polynomial above, we effectively account for bias estimation in the asymptotic distribution from the local quadratic kernel estimate– see Calonico, Cattaneo and Titiunik (2014, Remark 7) and Remark S.A.7 in their supplementary material. Therefore, the proposed theory and bootstrap allow for robust inference under the MSE-optimal bandwidth from the local quadratic kernel estimate.
Remark 2**.**
, and Theorem 2 are developed for the unconstrained estimators, that is, without imposing continuity in the conditional expectation of and . On the other hand, for example, consider the constrained version with the restriction with : the estimates can be obtained by solving the “pooled” least squares problem
[TABLE]
where and denoting the first/second/third left (right) derivatives. As shown in Appendix C.5, when a uniform kernel and symmetric bandwidths are used, the constrained estimators have the same asymptotic distributions as the unconstrained ones, thus our previous results still hold under the constrained estimates.
Appendix B Additional Practical Considerations
In order to compute the uniform consistent conditional density in Appendix B.1, and and in Appendix B.2, we continue to use the local cubic kernel models so the single MSE-optimal bandwidth from the local quadratic regression can be used throughout.
B.1 A Conditional Density Estimator
The statement of Theorem 2 presumes that the densities are unknown. In order to simulate the multiplier process, we need to replace them by their uniformly consistent estimators. Note that the identifying formulas in Theorem 1 suggest
[TABLE]
Equation (3.3) gives uniformly consistent estimators for the two terms in the denominator. The two terms in the numerator can be written as
[TABLE]
With the bandwidth parameter , we represent by the limit of the regularized approximation
[TABLE]
and we estimate it by the local cubic polynomial regression
[TABLE]
This estimate is used for (B.1). Therefore, is estimated by
[TABLE]
We make the following assumption about the bandwidth parameters and .
Assumption 3**.**
The bandwidth parameters and satisfy , , and and .
The following lemma shows that the first order derivative of the kernel regularization (B.2) with respect to are equivalent to the objects (B.1) of interest. We may thus use the estimates of to approximate .
Lemma 1**.**
Let Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3 hold. For each , .
A proof is provided in Appendix C.3. To show the uniform consistency of , it suffices to show . The following lemma establishes this point.
Lemma 2**.**
Under Assumptions 2 (i), (ii) (a) (b), (iv) (a) (b) and 3, it holds that
[TABLE]
A proof is provided in Appendix C.4.
B.2 First Stage Estimators
We will now give some examples of uniformly consistent estimators that satisfy the high-level condition in Assumption 2 (v). First, the density function of can be estimated by
[TABLE]
This can be shown to be consistent if and , is three-time differentiable and – see Theorem 1.1 of Li and Racine (2007).
We now propose first-stage estimators and that are used in the EMP. Denote and . We reuse the local cubic estimates from equations (3.2) and (3.3) without requiring to solve an additional optimization problem. We define the first-stage estimators by
[TABLE]
where
[TABLE]
The uniform consistency of these first-stage estimators, required as the high-level condition in Assumption 2 (v), follows from Lemma 7 of Chiang, Hsu, and Sasaki (2019), which is applicable under our Assumption 2 (i)–(iv).
B.3 Bandwidths
Another practical consideration is about a rule for selecting bandwidths in finite sample. We propose to start with the MSE-optimal bandwidths for local quadratic kernel smoothers as the bandwidth for our bias-corrected local cubic kernel estimation, and then to apply the rule-of-thumb correction for coverage optimality (Calonico, Cattaneo and Farrell, 2016, 2018). To keep the implementation simple, we use a single bandwidth that is based on minimizing the sum of MSEs of and , where both and are from local quadratic estimation problems. We first introduce short-hand notations. Let and .
For the kernel density estimator , we make use of Silverman’s rule of thumb
[TABLE]
where is the sample standard deviation of .
For the main bandwidth , we first choose
[TABLE]
where the leading bias and variance terms are given by
[TABLE]
respectively, with and given by global cubic parametric regressions of and , respectively, evaluated at for certain .
With the first-stage bandwidth having been selected, we can solve
[TABLE]
and thus compute our first-stage level estimate
[TABLE]
We next define the variance estimator by
[TABLE]
where is the first stage level estimator given above.
Finally, the main bandwidth selector is defined by
[TABLE]
where the leading bias and variance terms are given by
[TABLE]
In the end, following Calonico, Cattaneo and Farrell (2016, 2018), we can apply the rule-of-thumb (ROT) correction for coverage optimality bandwidth of the local quadratic regression to the main bandwidth as
For the bandwidth parameters and used for the conditional density estimator in Appendix B.1, we follow the choice rules proposed in the end of Appendix C in Frandsen, Frölich, and Melly (2012), and propose to set and .
Appendix C Auxiliary Lemmas and Proofs
C.1 Auxiliary Lemmas
C.1.1 Uniform Bahadur Representation
The following lemma proposes the uniform BR for the local slope estimators.
Lemma 3** (Chiang, Hsu, and Sasaki (2019); Lemma 1).**
Under Assumption 2, we have the uniform influence function representations (3.4) and (3.5) that hold uniformly on .
C.1.2 Functional Central Limit Theorem
Lemma 4**.**
Let triangular array of separable stochastic processes be row independent and write , and denote to be the outer integral (see, e.g., Section 1.2 of van der Vaart and Wellner (1996)). Suppose that the following conditions are satisfied:
* are manageable, with envelope which are also independent within rows;* 2. 2.
* exists for every ;* 3. 3.
; 4. 4.
* for each ;* 5. 5.
* where exists for every , and for all deterministic sequences and in , if then .*
Then is totally bounded under the pseudometric, and converges weakly to a tight mean zero Gaussian process concentrated on , with covariance .
C.2 Proof of Theorem 2
Before starting to present a proof of the theorem, we introduce additional definitions and notations for the proof of the theorem. Let be a class of measurable functions defined on with a measurable envelope . We say that is of VC type with envelope if there exist constants , such that , where the supremum is taken over the set of all finite discrete measures on .
To approximate the distribution of the BR, we define the following Multiplier Processes (MP):
[TABLE]
For ease of writing, we use the following notations for the differences of right and left limits of the BR, the MP, and the EMP with :
[TABLE]
With these preparations, we now start a proof of Theorem 2.
Part (i) (a): We will verify the five conditions in Lemma 4 for the triangular array of stochastic processes defined by
[TABLE]
The separability follows the same argument as in the proof of Theorem 4 of Kosorok (2003) and the left or right continuity of the processes. To show condition 1, define
[TABLE]
We first claim that is a VC type class with envelope
[TABLE]
for some constant . It is clear is of VC-subgraph with VC index since it is monotone increasing in , and thus for each pair with , it can never pick out . Similarly, , and are all VC subgraph classes, since they are sub-collections of all half spaces and then by Lemma 9.12 (i) of Kosorok (2008). Each of them is therefore of VC type with envelope . Next, Assumption 2(ii)(a)(b) imply
[TABLE]
for an and Euclidean norm . Thus is of VC type with envelope in light of Example 19.7 of van der Vaart (1998) and Lemma 9.18 of Kosorok (2008). Under Assumption 2(i)(b), (iii) and (iv), for each , the collection of a single function
[TABLE]
is of VC subgraph and therefore VC type with envelope . Example 19.19 of van der Vaart (1998) suggests VC type classes, that are of finite uniform integrals, are closed under element-wise addition and multiplication. Therefore, is of VC type with envelope constant and thus
[TABLE]
is of VC type with envelope . Finally, standard calculations show for each and for any the uniform entropy integral bound
[TABLE]
Equation (A.1) in the proof of Theorem 1 in Andrews (1994) then implies that is a manageable class of functions, as defined in Section 11.4.1 of Kosorok (2008). To check condition 2, notice
[TABLE]
It suffices to check since due to the law of iterated expectations, and thus the second term is [math]. When , under Assumption 2(i)(a)(b),(ii)(c),(iii), (iv)(a),
[TABLE]
where the second to the last equality is due to mean value expansions under Assumption 2 (i)(b) and (ii)(c). Notice that enters only through the term, and thus
[TABLE]
exists. Similar calculations hold for and , . This shows condition 2. Condition 3 is clear since
[TABLE]
under Assumption 2 (i)(a), (iii) and (iv)(a). To show condition 4, note that for each ,
[TABLE]
under Assumption 2 (i)(a), (iii) and (iv)(a). This shows condition 4. To show condition 5, note that we can write
[TABLE]
From our calculations on the way to show condition 2, we know that each term on the right-hand side exists under Assumption 2 (i)(a)(b),(ii)(c),(iii), (iv)(a). Since enters the expression only through the part, for all deterministic sequences and , implies By Lemma 4, we have and similarly for . Assumption 2(i)(a) then implies .
Part (i) (b): We apply the FCLT and the functional delta method. Notice that suggests
[TABLE]
Let , if , then is Hadamard differentiable at tangentially to with the Hadamard derivative given by . Therefore, under Assumption 1(ii), we know that is bounded away from [math]. Also, is bounded away from zero under Assumption 2(i)(c). The functional delta method then yields
[TABLE]
where
[TABLE]
Part (i) (c): Define operator as
[TABLE]
where . By Hadamard differentiability from Lemma 3.9.23(ii) of van der Vaart and Wellner (1996) and the chain rule (van der Vaart, 1998, Theorem 20.9), under Assumption 2(i)(c),(ii)(a)(b), is Hadamard differentiable at tangentially to and the derivative (Kosorok, 2008, Section 2.2.4) is
[TABLE]
is tangential to . The functional delta method then yields
[TABLE]
where
[TABLE]
Part (ii): This part of the proof consists of two steps. We first show the convergence result for the EMP, and then show the convergence result for .
Step 1 We claim . Applying Theorem 11.19 of Kosorok (2008), which is applicable under the five conditions verified in (i), we have . In light of of Lemma 2 of Chiang, Hsu, and Sasaki (2019), it then suffices to show
[TABLE]
Indeed, for , by Assumption 2(i)(b),(v), we have
[TABLE]
where . It can be shown that the array of zero mean random variables satisfies Lindeberg-Feller conditions (Proposition 2.27 of van der Vaart (1998)) under Assumption 2(i)(a), (iii) and (iv)(a)(c) and therefore converges in distribution to a normal distribution. Therefore, the asymptotic tightness then implies . Thus we conclude that equation C.1 is .
Step 2 We will show
[TABLE]
where
[TABLE]
We first use Theorem 12.1 of Kosorok (2008) (the functional delta for bootstrap) along with the conclusion of Step 1 to get
[TABLE]
Since the denominator is bounded away from [math] under Assumption 2(i)(iv), uniform consistency of , from Theorem 2 gives , and Lemma 2 of Chiang, Hsu, and Sasaki (2019) implies . Using the functional delta method for bootstrap again, we obtain
[TABLE]
Since are bounded away from zero, using asymptotic equicontinuity of following its (conditional) weak convergence and Theorem 3.7.23 of Giné and Nickl (2016), and the uniform consistency of and with along with Lemma 2 of Chiang, Hsu, and Sasaki (2019), we conclude part (ii) of the theorem.
C.3 Proof of Lemma 1
We prove the lemma by two steps: for each , Step 1 shows
[TABLE]
and Step 2 shows
[TABLE]
Step 1 For , under Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3, for each , for , applying the dominated convergence theorem, we have
[TABLE]
where lies between and . Similar result holds for .
Step 2 Under Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3, for each , for , an application of the dominated convergence theorem yields
[TABLE]
Similar result holds for . ∎
C.4 Proof of Lemma 2
The proof makes use of a maximal inequality from Chernozhukov, V., Chetverikov, D., & Kato, K. (2014). Under Assumptions 2 (ii) (a) (b) and 3, as in Section 1.6 of Tsybakov (2008), the solution to equation (B.3) can be written as
[TABLE]
where \alpha(0^{+},y,d)=\Big{[}\mu(0^{\pm},y,d),\mu^{\prime}(0^{\pm},y,d)a_{n},\mu^{\prime\prime}(0^{\pm},y,d)a_{n}^{2}/2!,\mu^{\prime\prime\prime}(0^{\pm},y,d)a_{n}^{3}/3!\Big{]}^{\top}. Multiply both sides by to get
[TABLE]
where
[TABLE]
From Step 1 of Proof of Lemma 1 in Chiang, Hsu, and Sasaki (2019), with Assumption 2 (i) (a) (b), (iii) and (iv) and 3, we have the common inverse factor
[TABLE]
uniformly in . It suffices to show that each of
[TABLE]
converges in probability to zero uniformly. We will divide the argument into the following four steps.
Step 1 Under Assumption 2 (i)(a), (ii)(a)(b), (iii) and (iv)(a), it holds that
[TABLE]
Step 2 We first bound the difference
[TABLE]
It suffices to show that each term converges in probability uniformly. Define for each
[TABLE]
where and . Note that for a fixed , for all . Fix any , under Assumption 2 (iv), is of VC Type class with measurable envelope . By Proposition 3.6.12 of Giné and Nickl (2016), is of VC type class with measurable envelope since is a mapping of bounded variations. Furthermore, is of VC-subgraph class and therefore of VC type. Lemma A.6 of Chernozhukov, V., Chetverikov, D., & Kato, K. (2014) then implies that the class of their element-wise product is of VC type with envelope , i.e., there exist positive constants , such that for and the supremum is taken over the set of all probability measures on . Corollary 5.1 in Chernozhukov, V., Chetverikov, D., & Kato, K. (2014) then gives
[TABLE]
Multiplying both sides by , we have
[TABLE]
The result then follows from Markov’s inequality and Assumption 3.
Step 3 We now want to control
[TABLE]
Since under Assumption 2 (ii)(a)(b), for any , , , this implies that is of VC type class in lieu of Example 19.7 of van der Vaart (1998) and Lemma 9.18 of Kosorok (2008). We can then follow the same steps as in Step 2 to show
[TABLE]
The desired result of the current step then follows from Markov’s inequality and Assumption 3.
Step 4 Finally, we show that the two expectations above are asymptotically equivalent uniformly in and . Under Assumption 2 (i) (b), (ii) (a) (b), (iii), (iv) (a), calculations yield
[TABLE]
by the law of iterated expectations under Assumption 3. This result, along with results from Steps 2 and 3, concludes the proof. ∎
C.5 On Remark 2
This appendix section proves the statement in Remark 2. We mostly follow the proof of Proposition 6 of Card, Lee, Pei, and Weber (2015). Let be the “stacked” outcome variable , where the first entries are observations to the left of and the last entries are those to the right of . Let be the matrix whose row is
[TABLE]
Also let
[TABLE]
with being the diagonal matrices
[TABLE]
The constrained estimator can be obtained from
[TABLE]
subject to where . Denote the resulting estimator by
[TABLE]
From equation (1.4.5) of Amemiya (1985), we have
[TABLE]
where the first term on the RHS is the unconstrained version and is
[TABLE]
Since , where and is the uniform kernel, we have . Therefore,
[TABLE]
where the constrained estimator has the same asymptotic distribution as the unconstrained one. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abadie (2002) Abadie, A. (2002) “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models,” Journal of the American Statistical Association, Vol. 97, No. 457, pp. 284–292.
- 2Amemiya (1985) Amemiya, T. (1985) Advanced Econometrics. Harvard University Press.
- 3Björklund and Moffitt (1987) Björklund, A. and Moffitt, R. (1987) “The Estimation of Wage and Welfare Gains in Self-Selection Models.” Review of Economics and Statistics, Vol. 69, No. 1, pp. 42–49.
- 4Calonico, Cattaneo and Farrell (2016) Calonico, S., Cattaneo, M.D., and Farrell, M. (2016) “Coverage Error Optimal Confidence Intervals for Regression Discontinuity Designs,” Working paper.
- 5Calonico, Cattaneo and Farrell (2018) Calonico, S., Cattaneo, M.D., and Farrell, M. (2018) “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference,” Journal of the American Statistical Association, Vol. 113, No. 522, pp. 767–779.
- 6Calonico, Cattaneo and Titiunik (2014) Calonico, S., Cattaneo, M.D., and Titiunik, R. (2014) “Robust Nonparametric Confidence Intervals for Regression Discontinuity Designs,” Econometrica, Vol. 82, No. 6, pp. 2295–2326.
- 7Card, Lee, Pei, and Weber (2015) Card, D., Lee, D., Pei, Z., and Weber, A. (2015) “Inference on Causal Effects in a Generalized Regression Kink Design,” Econometrica, Vol. 83, No. 6, pp. 2453–2483.
- 8Cerulli, Dong, Lewbel, and Poulsen (2017) Cerulli, G., Dong, Y., Lewbel, A., and Poulsen, A. (2017) “Testing Stability of Regression Discontinuity Models,” in Advances in Econometrics, Vol. 38: Regression Discontinuity Designs: Theory and Applications, M.D. Cattaneo and J.C. Escanciano, eds., pp. 317–339.
