Quantile Treatment Effects in Regression Kink Designs

Heng Chen; Harold D. Chiang; Yuya Sasaki

arXiv:1703.05109·stat.ME·December 16, 2020

Quantile Treatment Effects in Regression Kink Designs

Heng Chen, Harold D. Chiang, Yuya Sasaki

PDF

TL;DR

This paper establishes the first identification results for quantile treatment effects of binary treatments within regression kink designs, along with large sample inference methods and practical guidelines.

Contribution

It fills a gap by providing identification and inference methods for quantile effects of binary treatments in regression kink designs, which was previously unaddressed.

Findings

01

Identification of quantile treatment effects for binary treatments in regression kink designs.

02

Development of large sample inference theories.

03

Provision of practical estimation and inference guidelines.

Abstract

The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we fill this void in the literature by providing an identification of quantile treatment effects in regression kink designs with binary treatment variables. For completeness, we also develop large sample theories for statistical inference and a practical guideline on estimation and inference.

Equations286

Y

Y

D

F_{Y^{1} ∣ V X} (y ∣ h (0), 0) =

F_{Y^{1} ∣ V X} (y ∣ h (0), 0) =

F_{Y^{0} ∣ V X} (y ∣ h (0), 0) =

τ (θ) = in f {y \in Y : F_{Y^{1} ∣ V X} (y ∣ h (0), 0) \geq θ} - in f {y \in Y : F_{Y^{0} ∣ V X} (y ∣ h (0), 0) \geq θ}

τ (θ) = in f {y \in Y : F_{Y^{1} ∣ V X} (y ∣ h (0), 0) \geq θ} - in f {y \in Y : F_{Y^{0} ∣ V X} (y ∣ h (0), 0) \geq θ}

\frac{d}{d x} E [D ∣ X = x] = \frac{d}{d x} \int_{- \infty}^{h (x)} f_{V ∣ X} (v ∣ x) d v = h^{'} (x) \cdot f_{V ∣ X} (h (x) ∣ x) + \int_{- \infty}^{h (x)} \frac{\partial}{\partial x} f_{V ∣ X} (v ∣ x) d v

\frac{d}{d x} E [D ∣ X = x] = \frac{d}{d x} \int_{- \infty}^{h (x)} f_{V ∣ X} (v ∣ x) d v = h^{'} (x) \cdot f_{V ∣ X} (h (x) ∣ x) + \int_{- \infty}^{h (x)} \frac{\partial}{\partial x} f_{V ∣ X} (v ∣ x) d v

\frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x] =

\frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x] =

=

=

=

\int_{- \infty}^{h (x)} \frac{d}{d x} [f_{V ∣ X} (v ∣ x) \cdot F_{Y^{1} ∣ V X} (y ∣ v, x) d v] d v

x ↓ 0 lim \frac{d}{d x} E [D ∣ X = x] - x ↑ 0 lim \frac{d}{d x} E [D ∣ X = x] = [h^{'} (0^{+}) - h^{'} (0^{-})] \cdot f_{V ∣ X} (h (0) ∣0),

x ↓ 0 lim \frac{d}{d x} E [D ∣ X = x] - x ↑ 0 lim \frac{d}{d x} E [D ∣ X = x] = [h^{'} (0^{+}) - h^{'} (0^{-})] \cdot f_{V ∣ X} (h (0) ∣0),

x ↓ 0 lim \frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x] - x ↑ 0 lim \frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x]

x ↓ 0 lim \frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x] - x ↑ 0 lim \frac{d}{d x} E [\mathbbm 1 {Y \leq y} \cdot D ∣ X = x]

=

\frac{lim _{x ↓ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot D ∣ X = x ]}{lim _{x ↓ 0} \frac{d}{d x} E [ D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ D ∣ X = x ]} = F_{Y^{1} ∣ V X} (y ∣ h (0), 0)

\frac{lim _{x ↓ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot D ∣ X = x ]}{lim _{x ↓ 0} \frac{d}{d x} E [ D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ D ∣ X = x ]} = F_{Y^{1} ∣ V X} (y ∣ h (0), 0)

\frac{lim _{x ↓ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot ( 1 - D ) ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot ( 1 - D ) ∣ X = x ]}{lim _{x ↓ 0} \frac{d}{d x} E [ 1 - D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ 1 - D ∣ X = x ]}

\frac{lim _{x ↓ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot ( 1 - D ) ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ \mathbbm 1 { Y \leq y } \cdot ( 1 - D ) ∣ X = x ]}{lim _{x ↓ 0} \frac{d}{d x} E [ 1 - D ∣ X = x ] - lim _{x ↑ 0} \frac{d}{d x} E [ 1 - D ∣ X = x ]}

= F_{Y^{0} ∣ V X} (y ∣ h (0), 0)

C_{RDD} = {ω \in Ω : X (ω) = 0, h (0^{-}) < V (ω) \leq h (0^{+})}

C_{RDD} = {ω \in Ω : X (ω) = 0, h (0^{-}) < V (ω) \leq h (0^{+})}

C_{RKD} = {ω \in Ω : X (ω) = 0, V (ω) = h (0)},

C_{RKD} = {ω \in Ω : X (ω) = 0, V (ω) = h (0)},

F_{Y^{d} ∣ V X} (y ∣ h (0), 0) = \frac{μ _{1}^{'} ( 0 ^{+} , y , d ) - μ _{1}^{'} ( 0 ^{-} , y , d )}{μ _{2}^{'} ( 0 ^{+} , d ) - μ _{2}^{'} ( 0 ^{-} , d )},

F_{Y^{d} ∣ V X} (y ∣ h (0), 0) = \frac{μ _{1}^{'} ( 0 ^{+} , y , d ) - μ _{1}^{'} ( 0 ^{-} , y , d )}{μ _{2}^{'} ( 0 ^{+} , d ) - μ _{2}^{'} ( 0 ^{-} , d )},

μ_{1} (x, y, d)

μ_{1} (x, y, d)

μ_{2} (x, d)

\displaystyle\hat{\mu}_{1}^{\prime}(0^{\pm},y,d)h_{n}=e_{1}^{\top}\mathop{\rm arg~{}min}\limits_{\alpha\in\mathds{R}^{4}}\sum_{i=1}^{n}\Big{[}\mathds{1}\{Y_{i}\leq y\}\mathds{1}\{D_{i}=d\}-r_{3}^{\top}\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\alpha\Big{]}^{2}K\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\delta_{i}^{\pm}\quad\text{and}

\displaystyle\hat{\mu}_{1}^{\prime}(0^{\pm},y,d)h_{n}=e_{1}^{\top}\mathop{\rm arg~{}min}\limits_{\alpha\in\mathds{R}^{4}}\sum_{i=1}^{n}\Big{[}\mathds{1}\{Y_{i}\leq y\}\mathds{1}\{D_{i}=d\}-r_{3}^{\top}\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\alpha\Big{]}^{2}K\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\delta_{i}^{\pm}\quad\text{and}

\displaystyle\hat{\mu}_{2}^{\prime}(0^{\pm},d)h_{n}=e_{1}^{\top}\mathop{\rm arg~{}min}\limits_{\alpha\in\mathds{R}^{4}}\sum_{i=1}^{n}\Big{[}\mathds{1}\{D_{i}=d\}-r_{3}^{\top}\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\alpha\Big{]}^{2}K\Big{(}\frac{X_{i}}{h_{n}}\Big{)}\delta_{i}^{\pm},

F_{Y^{d} ∣ V X} (y ∣ h (0), 0) = \frac{μ ^ _{1}^{'} ( 0 ^{+} , y , d ) - μ ^ _{1}^{'} ( 0 ^{-} , y , d )}{μ ^ _{2}^{'} ( 0 ^{+} , d ) - μ ^ _{2}^{'} ( 0 ^{-} , d )} .

F_{Y^{d} ∣ V X} (y ∣ h (0), 0) = \frac{μ ^ _{1}^{'} ( 0 ^{+} , y , d ) - μ ^ _{1}^{'} ( 0 ^{-} , y , d )}{μ ^ _{2}^{'} ( 0 ^{+} , d ) - μ ^ _{2}^{'} ( 0 ^{-} , d )} .

\overset{τ}{^} (θ) =

\overset{τ}{^} (θ) =

= Q_{Y^{1} ∣ V X} (θ) - Q_{Y^{0} ∣ V X} (θ) .

ν_{n}^{\pm} (y, d, 1) =

ν_{n}^{\pm} (y, d, 1) =

=

ν_{n}^{\pm} (y, d, 2) =

=

\overset{ν}{^}_{ξ, n}^{\pm} (y, d, 1) =

\overset{ν}{^}_{ξ, n}^{\pm} (y, d, 1) =

\overset{ν}{^}_{ξ, n}^{\pm} (y, d, 2) =

n h_{n}^{3} [\overset{τ}{^} (\cdot) - τ (\cdot)]

n h_{n}^{3} [\overset{τ}{^} (\cdot) - τ (\cdot)]

Ξ (\cdot) = - [\frac{Z ^ _{ξ, n} ( Q ^ _{Y^{1} ∣ V X} ( \cdot ) , 1 )}{f ^ _{Y^{1} ∣ V X} ( Q ^ _{Y^{1} ∣ V X} ( \cdot ) ∣ h ( 0 ) , 0 )} - \frac{Z ^ _{ξ, n} ( Q ^ _{Y^{0} ∣ V X} ( \cdot ) , 0 )}{f ^ _{Y^{0} ∣ V X} ( Q ^ _{Y^{0} ∣ V X} ( \cdot ) ∣ h ( 0 ) , 0 )}],

Ξ (\cdot) = - [\frac{Z ^ _{ξ, n} ( Q ^ _{Y^{1} ∣ V X} ( \cdot ) , 1 )}{f ^ _{Y^{1} ∣ V X} ( Q ^ _{Y^{1} ∣ V X} ( \cdot ) ∣ h ( 0 ) , 0 )} - \frac{Z ^ _{ξ, n} ( Q ^ _{Y^{0} ∣ V X} ( \cdot ) , 0 )}{f ^ _{Y^{0} ∣ V X} ( Q ^ _{Y^{0} ∣ V X} ( \cdot ) ∣ h ( 0 ) , 0 )}],

\hat{Z}_{ξ, n} (y, d) =

\hat{Z}_{ξ, n} (y, d) =

\frac{[ μ ^ _{2}^{'} ( 0 ^{+} , d ) - μ ^ _{2}^{'} ( 0 ^{-} , d )] [ ν ^ _{ξ, n}^{+} ( y , d , 1 ) - ν ^ _{ξ, n}^{-} ( y , d , 1 )] - [ μ ^ _{1}^{'} ( 0 ^{+} , y , d ) - μ ^ _{1}^{'} ( 0 ^{-} , y , d )] [ ν ^ _{ξ, n}^{+} ( y , d , 2 ) - ν ^ _{ξ, n}^{-} ( y , d , 2 )]}{[ μ ^ _{2}^{'} ( 0 ^{+} , d ) - μ ^ _{2}^{'} ( 0 ^{-} , d ) ] ^{2}} .

T^{T S} = θ \in Θ sup n h_{n}^{3} \overset{τ}{^} (θ)

T^{T S} = θ \in Θ sup n h_{n}^{3} \overset{τ}{^} (θ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Quantile Treatment Effects in Regression Kink Designs††thanks: First arXiv date: March 15, 2017.

Heng Chen

Bank of Canada Heng Chen: [email protected]. Currency Department, Bank of Canada, 234 Wellington Street, Ottawa, ON, K1A 0G9, Canada.

Harold D. Chiang

Vanderbilt Harold D. Chiang: [email protected]. Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA.

Yuya Sasaki

Vanderbilt Yuya Sasaki: [email protected]. Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA.

We thank Yingying Dong, Robert Moffitt, and participants at New York Camp Econometrics XIII for useful comments. All the remaining errors are ours.

Abstract

The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we fill this void in the literature by providing an identification of quantile treatment effects in regression kink designs with binary treatment variables. For completeness, we also develop large sample theories for statistical inference and a practical guideline on estimation and inference.

Keywords: causal interpretation, identification, quantile treatment effects, regression kink design.

1 Introduction

Theories of identification in regression kink designs are advanced by a few papers in the recent literature. Card, Lee, Pei, and Weber (2015) propose identification of average effects of continuous treatments. Dong (2018) proposes identification of average effects of binary treatments. Chiang and Sasaki (2019) propose identification of quantile-wise effects of continuous treatments. To date, no theory has been proposed for identification of quantile-wise effects of binary treatments in regression kink designs. This paper aims to fill this void in the literature.

Specifically, in regression kink designs with binary treatments, we show that a local Wald ratio of derivatives of certain conditional expectation functions can be used to identify the conditional distribution functions of the potential outcomes given the event of local compliance. These conditional distribution functions can be used in turn to identify the quantile treatment effects given the event of local compliance. Our identification argument parallels that of Frandsen, Frölich, and Melly (2012), who show that a local Wald ratio of certain conditional expectation functions can be used to identify the conditional distribution functions of potential outcomes given the event of local compliance in the context of regression discontinuity designs. Because of the lack of discontinuity in our context of regression kink designs, however, our identification result entails the limit case of the event of local compliance, which amounts to the subpopulation to which the marginal treatment effects (Björklund and Moffitt, 1987; Heckman and Vytlacil, 1999; Heckman, and Vytlacil, 2005) are relevant. This is analogous to, and provides a quantile counterpart of the identification result by Dong (2018).

Our identifying formula takes a form of local Wald ratios of derivatives of functions. Such a form is related to the identifying formulas of several papers in the existing literature. These papers include Dong and Lewbel (2015) – also see Cerulli, Dong, Lewbel, and Poulsen (2017) – who use a local Wald ratio of derivatives of conditional expectation functions to identify the average effect of changing the threshold location in regression discontinuity designs, Card, Lee, Pei, and Weber (2015) who use a local Wald ratio of derivatives of conditional expectation functions to identify average effects of continuous treatments in regression kink designs, Dong (2018) who use a local Wald ratio of derivatives of conditional expectation functions to identify average effects of binary treatments in regression kink designs, and Chiang and Sasaki (2019) who use a local Wald ratio of derivatives of conditional quantile functions to identify quantile-wise effects of continuous treatments in regression kink designs. Differently from each of these papers, we use the difference of left-inverses of two local Wald ratios of derivatives of conditional expectation functions to identify quantile-wise effects of binary treatments in regression kink designs.

While we motivate this paper by quantile treatment effects, the identifying formulas we provide as the main result of this paper can be also used to identity the distributional treatment effects. Therefore, this paper also relates to Abadie (2002) who uses a form of Wald ratios to identify distributional treatment effects, and more closely relates to Shen and Zhang (2016) who consider distributional treatment effects in the context of regression discontinuity designs.

In addition to the main identification result, we also provide methods of estimation and inference for quantile treatment effects based on analog estimators of our identifying formulas. While our identification result is novel, estimation and inference results follow from an adaptation of existing approaches to our framework. Therefore, the main text focuses on the identification theory. Details of estimation and inference theories are found in the appendix.

The rest of this paper is organized as follows. In Section 2, we develop the identification result. Section 3 presents a practical guideline on estimation and inference. Appendix A presents formal theories for the method of inference. Appendix B presents additional practical considerations. Appendix C contains mathematical details.

2 Identification: the Main Result

We model the random vector $(Y,D,X,U,V):(\Omega^{x},\mathscr{F}^{x},\mathds{P}^{x})\rightarrow\mathscr{Y}\times\mathscr{D}\times\mathscr{X}\times\mathscr{U}\times\mathscr{V}$ through the following causal structure, where $\mathscr{Y}\subset\mathbb{R}$ , $\mathscr{D}=\{0,1\}$ , $\mathscr{X}\subset\mathbb{R}$ , $\mathscr{U}\subset\mathbb{R}^{d_{U}}$ for $d_{U}\in\mathbb{N}$ , and $\mathscr{V}\subset\mathbb{R}$ .

[TABLE]

In equation (2.1), the outcome variable $Y$ is produced through function $g$ by a binary treatment variable $D$ , a continuous running variable or assignment variable $X$ , and miscellaneous factors $U$ . We let $Y^{d}=g(d,X,U)$ denote the potential outcome random variable that an individual with attributes $(X,U)$ would produce under each hypothetical treatment choice $d\in\{0,1\}$ . The actual treatment choice $D$ is determined by $X$ and $V$ through the threshold-crossing model (2.2). A researcher observes the joint distribution of $Y$ , $D$ , and $X$ . However, a researcher cannot observe $U$ or $V$ . We do not impose any statistical independence condition in this model. Therefore, existing methods for instrumental variable quantile regression (e.g., Chernozhukov and Hansen, 2005) will not apply here. In particular, we do not assume statistical independence between the running variable $X$ and the unobservables $(U,V)$ . Instead, we make the following assumption of the regression kink design (RKD).

Assumption 1 (Regression Kink Design, RKD).

*Let $x_{0}=0\in\mathscr{X}$ be a designed kink location.

(i) $h$ is continuously differentiable in a deleted neighborhood $I_{X}\backslash\{0\}\subset\mathscr{X}$ of $x_{0}=0$ .

(ii) $h$ is continuous at $x_{0}=0$ .

(iii) $\lim_{x\downarrow 0}h^{\prime}(x)\neq\lim_{x\uparrow 0}h^{\prime}(x)$ , where $h^{\prime}$ denotes $dh/dx$ .

(iv) The conditional distribution of $V$ given $X$ is absolutely continuous with a continuously differentiable conditional density function $f_{V|X}(\cdot|\cdot)$ .

(v) The conditional cumulative distribution function $F_{Y^{d}|VX}(y|\cdot,\cdot)$ is continuously differentiable for each $y\in\mathscr{Y}$ for each $d\in\{0,1\}$ .

(vi) $f_{V|X}(h(0)|0)>0$ .*

The research design as required by Assumption 1 consists of three broad pieces. First, the treatment assignment rule $h$ has a kink at the designed location $x_{0}=0$ , as formally stated in parts (ii) and (iii), but this assignment rule $h$ is reasonably smooth elsewhere, as formally stated in part (i). Second, every other function is reasonably smooth, as formally stated in parts (iv) and (v). Third, there is sufficient data at the designed kink location $x_{0}=0$ , as formally stated in part (vi). This assumption is analogous to that of Dong (2018) who analyzes average effects of binary treatments in the regression kink design. Under this design, we obtain the following identification result for conditional distributions of the potential outcomes $Y^{d}$ given the event of $(V,X)=(h(0),0)$ .

Theorem 1 (Identification).

Let Assumption 1 hold for the model (2.1)–(2.2). Then,

[TABLE]

hold for all $y\in\mathscr{Y}$ .

Once the conditional cumulative distribution functions, $F_{Y^{d}|VX}(\cdot|h(0),0)$ for $d\in\{0,1\}$ , are identified through the formulas presented in Theorem 1, the conditional quantile treatment effect is in turn identified by

[TABLE]

for $\theta\in(0,1)$ . Theorem 1 also provides the identification of the distributional treatment effects for local complies, $F_{Y^{1}|VX}(\cdot|h(0),0)-F_{Y^{0}|VX}(\cdot|h(0),0)$ , as in Abadie (2002) and Shen and Zhang (2016), which are useful to test important hypotheses such as the first order stochastic dominance.111We remark that, with our identifying formulas provided in Theorem 1, $F_{Y^{1}|VX}(\cdot|h(0),0)-F_{Y^{0}|VX}(\cdot|h(0),0)$ can be simply expressed as a single Wald ratio: $\frac{\lim_{x\downarrow 0}\frac{d}{dx}\operatorname{\text{E}}\left[\mathbbm{1}\left\{Y\leq y\right\}|X=x\right]-\lim_{x\uparrow 0}\frac{d}{dx}\operatorname{\text{E}}\left[\mathbbm{1}\left\{Y\leq y\right\}|X=x\right]}{\lim_{x\downarrow 0}\frac{d}{dx}\operatorname{\text{E}}\left[D|X=x\right]-\lim_{x\uparrow 0}\frac{d}{dx}\operatorname{\text{E}}\left[D|X=x\right]}$ .

Proof of Theorem 1: By applying Leibniz rule under Assumption 1 (i) and (iv), we have

[TABLE]

for all $x\in I_{X}\backslash\{0\}$ . Similarly, by applying Leibniz rule under Assumption 1 (i), (iv), and (v), we have

[TABLE]

for all $(x,y)\in\left(I_{X}\backslash\{0\}\right)\times\mathscr{Y}$ . Therefore, by Assumption 1 (ii) and (iv), we can write

[TABLE]

and, by Assumption 1 (ii), (iv), and (v), we can write

[TABLE]

for all $y\in\mathscr{Y}$ . Taking the ratio of these expressions under Assumption 1 (iii) and (vi) yields

[TABLE]

for all $y\in\mathscr{Y}$ . Similar lines of arguments yield

[TABLE]

for all $y\in\mathscr{Y}$ . ∎

Discussions of Theorem 1: In the context of the regression discontinuity design (RDD) where $h(0^{-})<h(0^{+})$ , Frandsen, Frölich, and Melly (2012) show that similar local Wald ratios identify the conditional distribution of the potential outcomes given the event

[TABLE]

of local compliance. In our context of the regression kink design where $h(0^{-})=h(0^{+})$ , Theorem 1 shows that local Wald ratios of the derivatives identify the conditional distributions of the potential outcomes given the event

[TABLE]

which may be considered as a limit of the event $C_{\text{RDD}}$ for RDD as $\left|h(0^{+})-h(0^{-})\right|$ approaches [math]. In this sense, our causal interpretation result is similar to that of the marginal treatment effects (Björklund and Moffitt, 1987; Heckman and Vytlacil, 1999; Heckman, and Vytlacil, 2005). This interpretation is analogous to the identification result by Dong (2018) who analyzes average effects of binary treatments in the regression kink design. $\triangle$

3 Estimation and Inference: a Practical Guideline

While the main contribution of this paper lies in our new identification result presented in Section 2, we also develop a theory and method of estimation and inference for completeness. Since the estimation and inference strategies are standard, we relegate most of the details to the appendix. In this section, we present a practical guideline on estimation and inference for the conditional quantile treatment effects $\tau(\theta)$ . A formal theory is presented in Appendix A. We also present additional practical considerations in Appendix B. Auxiliary lemmas and proofs are found Appendix C.

The local Wald ratios proposed in Theorem 1 as identifying formulae can be succinctly rewritten as

[TABLE]

where $\mu_{1}^{\prime}(x,y,d)$ and $\mu_{2}^{\prime}(x,d)$ are the partial derivatives with respect to $x$ of $\mu_{1}(x,y,d)$ and $\mu_{2}(x,d)$ defined by

[TABLE]

respectively. We estimate the components of (3.1) by the one-sided local cubic estimators

[TABLE]

where $K$ is a kernel function, $h_{n}$ is a bandwidth parameter, $e_{1}=(0,1,0,0)^{\top}$ , $r_{3}(u)=(1,u,u^{2},u^{3})^{\top}$ , $\delta_{i}^{+}=\mathds{1}\{X_{i}\geq 0\}$ and $\delta_{i}^{-}=\mathds{1}\{X_{i}<0\}$ . A plug-in estimator for (3.1) is given by

[TABLE]

The motivation for our using the local cubic polynomial is to account for the manual bias correction from local quadratic estimators. By considering the asymptotic distribution for the higher-order local polynomial, we effectively account for bias estimation in the asymptotic distribution from the lower-order one, thus allowing for robustness in inference against large bandwidths – see Calonico, Cattaneo and Titiunik (2014, Remark 7) and Remark S.A.7 in their supplementary material.

We can in turn estimate the conditional quantile treatment effect $\tau(\theta)$ by

[TABLE]

The local Wald estimator $\widehat{F}_{Y^{d}|VX}(\cdot|h(0),0)$ is not always monotone increasing in finite sample. For ease of implementing the CDF inversion, we monotonize the estimated CDFs by re-arrangements following Chernozhukov, Fernández-Val, Galichon (2010). This does not affect the asymptotic properties of the estimators, while allowing for inversion of the CDF estimators. Frandsen, Frölich, and Melly (2012) also use this technology in the context of the regression discontinuity design.

Let $\Gamma^{\pm}=\int_{\mathds{R}_{\pm}}r_{3}(u)r_{3}^{\top}(u)K(u)du$ . Under the assumptions to be stated in Appendix A, we obtain the following Uniform Bahadur Representations (BR) for the local slope estimators (3.2) and (3.3).

[TABLE]

We note that $\nu_{n}^{\pm}(y,d,2)$ are trivial functions of $y$ .

Covariance functions for the limit processes are often cumbersome to approximate in practice. Qu and Yoon (2018) propose a simulation method to approximate limit processes under sharp designs – also see Qu and Yoon (2015) – but this method is not applicable to fuzzy designs. We thus propose to use the multiplier bootstrap method to approximate the asymptotic distributions of these BR. Draw a random sample $\xi_{1},...,\xi_{n}$ from the standard normal distribution independently from the data $\{Y_{i},D_{i},X_{i}\}_{i=1}^{n}$ . Replacing the unknowns $\mu_{1},\mu_{2}$ and $f_{X}(0)$ in the BR by their uniformly consistent estimators $\tilde{\mu}_{1},\tilde{\mu}_{2}$ and $\hat{f}_{X}\left(0\right)$ , respectively, we define the following Estimated Multiplier Processes (EMP).

[TABLE]

Under the assumptions to be stated in Appendix A, we show that the EMP can be used to uniformly approximate the asymptotic distribution of the BR. Consequently, by the functional delta method, the asymptotic distribution of

[TABLE]

can be approximated uniformly on $\Theta=[a,1-a]$ for $a\in(0,1/2)$ by the estimated process

[TABLE]

where

[TABLE]

Once we obtain these approximations to the asymptotic distributions, we may conduct various tests of quantile functions following Koenker and Xiao (2002) and Chernozhukov and Fernández-Val (2005). For example for the test of treatment significance, we use the test statistic

[TABLE]

where $\Theta=[a,1-a]$ for some $a\in(0,1/2)$ . We can approximate the asymptotic distribution of $T^{TS}$ by

[TABLE]

Similarly, for the test of treatment homogeneity, we use the test statistic

[TABLE]

We can approximate the asymptotic distribution of $T^{TH}$ by

[TABLE]

In this section, we presented a practical guideline on estimation and inference for the conditional quantile treatment effects $\tau(\theta)$ . We refer interested readers to Appendix A for a formal theory. Furthermore, Appendix B presents additional practical considerations not covered in this section.

4 Summary

The existing literature on identification in regression kink designs includes the following three results. Card, Lee, Pei, and Weber (2015) propose identification of average effects of continuous treatments. Dong (2018) proposes identification of average effects of binary treatments. Chiang and Sasaki (2019) propose identification of quantile-wise effects of continuous treatments. On the other hand, this literature has been missing an identification result for quantile-wise effects of binary treatments. To complete this literature on identification, we propose identification of quantile-wise effects of binary treatments in this paper in regression kink designs.

Specifically, we show that a local Wald ratio of derivatives of certain conditional expectation functions identifies the conditional distribution functions of potential outcomes given the event of local compliance. Taking the difference of the left-inverses of these identified conditional distribution functions in turn identifies the conditional quantile treatment effects given the event of local compliance. While the main contribution of this paper is the identification result, we also develop a theory and method of estimation and inference for completeness.

Mathematical Appendix

Appendix A Estimation and Inference: Formal Theory

We use the following set of assumptions for the uniform Bahadur Representations, the bootstrap validity, and consistent conditional density and first-stage estimations. Fix $a\in(0,1/2)$ and $\epsilon>0$ , denote

[TABLE]

We will write $a\lesssim b$ if there exists a universal constant $C$ such that $a\leq Cb$ . Denote

[TABLE]

We define the following objects for all $y_{1}$ , $y_{2}\in\mathscr{Y}_{1}$ , $d_{1}$ , $d_{2}\in\mathscr{D}:$

[TABLE]

Assumption 2.

*Let $[\underline{x},\overline{x}]$ be a compact interval containing [math] in its interior. Let $a\in(0,1/2)$ .

(i) (a) $\{Y_{i},D_{i},X_{i}\}_{i=1}^{n}$ are $n$ independent copies of random vector $(Y,D,X)$ with support $\mathscr{Y}\times\mathscr{D}\times\mathscr{X}$ defined on a probability space $(\Omega^{x},\mathscr{F}^{x},\mathds{P}^{x})$ . (b) $X$ has a continuously differentiable density function $f_{X}$ with $0<f_{X}(0)<\infty$ . (c) $f_{YD|X}(y,d|x)$ is well-defined on $\mathscr{Y}_{1}\times\mathscr{D}\times([\underline{x},\overline{x}]\setminus\{0\})$ and $|f_{YD|X}(y,d|0^{+})-f_{YD|X}(y,d|0^{-})|>m>0$ on $\mathscr{Y}_{1}\times\mathscr{D}$ .

(ii)(a) Conditional density $f_{Y|XD}$ is Lipschitz continuous on $\mathscr{Y}_{1}\times[\underline{x},0)$ and $\mathscr{Y}_{1}\times(0,\overline{x}]$ for each $d$ and is four-time partially differentiable in $x$ and twice partially differentiable in $y$ for each $d$ . $\frac{\partial^{j}}{\partial x^{j}}\frac{\partial^{k}}{\partial y^{k}}f_{Y|XD}(\cdot|\cdot,d)$ is continuous and uniformly bounded on $\mathscr{Y}_{1}\times[\underline{x},0)$ and $\mathscr{Y}_{1}\times(0,\overline{x}]$ for each $d$ for all $j$ , $k\in\mathds{N}$ , $j+k\leq 4$ . (b) $P_{D|X}(d|\cdot)$ is Lipschitz continuous in $x$ , four-time differentiable on $[\underline{x},0)$ and $(0,\overline{x}]$ for each $d$ . $\frac{\partial^{4}}{\partial x^{4}}P_{D|X}(d|\cdot)$ is continuous and uniformly bounded on $[\underline{x},0)$ and $(0,\overline{x}]$ for each d. (c) For any $y_{1}$ , $y_{2}\in\mathscr{Y}_{1}$ , $d_{1}$ , $d_{2}\in\mathscr{D}$ , we have $\sigma_{11}((y_{1},d_{1}),(y_{2},d_{2})|\cdot)$ , $\sigma_{12}((y_{1},d_{1}),(y_{2},d_{2})|\cdot)$ and $\sigma_{22}((y_{1},d_{1}),(y_{2},d_{2})|\cdot)\in\mathcal{C}^{1}([\underline{x},\overline{x}]\setminus\{0\})$ where $\mathcal{C}^{1}$ is the collection of continuously differentiable functions.

(iii) The bandwidths satisfy $h_{n}\rightarrow 0$ , $nh_{n}^{3}\rightarrow\infty$ , $nh_{n}^{9}\rightarrow 0$ , $0<h_{n}\leq h_{0}$ for some finite $h_{0}$ .

(iv) (a) $K:[-1,1]\rightarrow\mathds{R}_{+}$ is bounded and $\int_{\mathds{R}}K(u)du=1$ . (b) $\{K(\cdot/h):h>0\}$ is of VC type. (c) $\Gamma^{\pm}=\int_{\mathds{R}_{\pm}}r_{3}(u)r_{3}^{\top}(u)K(u)du$ are positive definite.

(v) $\hat{f}_{X}(0)$ is a consistent estimator for $f_{X}(0)$ . For $d=0,1$ , $\hat{f}_{Y^{d}|VX}(\cdot|h(0),0)$ are uniformly consistent estimators for $f_{Y^{d}|VX}(\cdot|h(0),0)$ . $\tilde{\mu}_{1}(x,y,d)\mathds{1}\{|x/h_{n}|\leq 1\}$ and $\tilde{\mu}_{2}(x,d)\mathds{1}\{|x/h_{n}|\leq 1\}$ are uniformly consistent estimators for $\mu_{1}(x,y,d)\mathds{1}\{|x/h_{n}|\leq 1\}$ and $\mu_{2}(x,d)\mathds{1}\{|x/h_{n}|\leq 1\}$ on $\mathscr{X}\times\mathscr{Y}_{1}\times\mathscr{D}$ .

(vi) $\{\xi_{1},...,\xi_{n}\}$ are $n$ independent and identically distributed copies of a standard normal random variable $\xi$ defined on a probability space $(\Omega^{\xi},\mathscr{F}^{\xi},\mathds{P}^{\xi})$ that is independent of $(\Omega^{x},\mathscr{F}^{x},\mathds{P}^{x})$ .*

Part (i) concerns about the sampling procedure and the distribution of data. Part (ii) requires smoothness of the conditional expectation functions on a deleted neighborhood of $x_{0}=0$ . Part (iii) regulates the rate at which bandwidth decreases, which is consistent with examples of common choice rules to be presented in Appendix B.3. For example, the MSE-optimal bandwidth for the local quadratic estimator (e.g., $nh_{n}^{7}\rightarrow\infty$ ) is allowed. Part (iv) is satisfied by common kernel functions, such as uniform, triangular, biweight, triweight, and Epanechnikov kernels, for example. Part (v) is a high-level assumption of (uniformly) consistent estimation of the first-stage estimators. While we keep this high-level statement for the current section, Appendix B.2 proposes concrete examples of such uniformly consistent estimators. Part (vi) requires the multiplier random sample to be drawn independently of the data $\{Y_{i},D_{i},X_{i}\}_{i=1}^{n}$ . We remark that part (vi) implies that all (uniformly) consistent estimators with respect to $\mathds{P}^{x}$ are also (uniformly) consistent with respect to $\mathds{P}^{x\times\xi}$ .

Under Assumption 2 (i), (ii)(a)(b), (iii), (iv), an application of Lemma 1 of Chiang, Hsu, and Sasaki (2019) gives the uniform Bahadur Representation as in equations (3.4) and (3.5). The following theorem establishes (i) (a) the asymptotic distribution of the BR; (i) (b) the asymptotic distribution of the local Wald estimators; (i) (c) the asymptotic distribution of the conditional quantile treatment effect estimator; and (ii) the bootstrap validity. A proof is provided in Appendix C.2.

Theorem 2 (Asymptotic Distributions and Bootstrap Validity).

*Suppose Assumptions 1 and 2 hold, then there exists a zero mean Gaussian process $\mathds{G}:\Omega^{x}\mapsto\ell^{\infty}(\{\mathscr{Y}_{1}\times\mathscr{D}\times\{1,2\}\})$ , where $l^{\infty}$ is the collection of all bounded real valued functions, such that:

(i) (a) $\nu_{n}^{+}-\nu_{n}^{-}\leadsto\mathds{G}$ .

(i) (b) $\sqrt{nh_{n}^{3}}[\widehat{F}_{Y^{d}|VX}(\cdot|h(0),0)-F_{Y^{d}|VX}(\cdot|h(0),0)]\leadsto\mathds{G}_{F}(\cdot,d)$ holds, where $\mathds{G}_{F}(\cdot,d)$ is given by*

[TABLE]

(i) (c) $\sqrt{nh_{n}^{3}}[\hat{\tau}-\tau]\leadsto\mathds{G}_{\tau}$ holds, where $\mathds{G}_{\tau}$ is given for each $\theta\in\Theta=[a,1-a]$ by

[TABLE]

(ii) We have

[TABLE]

Remark 1.

By considering the asymptotic distribution for the local cubic local polynomial above, we effectively account for bias estimation in the asymptotic distribution from the local quadratic kernel estimate– see Calonico, Cattaneo and Titiunik (2014, Remark 7) and Remark S.A.7 in their supplementary material. Therefore, the proposed theory and bootstrap allow for robust inference under the MSE-optimal bandwidth from the local quadratic kernel estimate.

Remark 2.

$\hat{\mu}_{1}^{\prime}(0^{\pm},y,d)$ , $\hat{\mu}_{2}^{\prime}(0^{\pm},d)$ and Theorem 2 are developed for the unconstrained estimators, that is, without imposing continuity in the conditional expectation of $\mathds{1}\{Y_{i}\leq y\}\mathds{1}\{D_{i}=d\}$ and $\mathds{1}\{D_{i}=d\}$ . On the other hand, for example, consider the constrained version with the restriction with $\mu_{1}(0^{+},y,d)=\mu_{1}(0^{-},y,d)$ : the estimates can be obtained by solving the “pooled” least squares problem

[TABLE]

where $r_{3\backslash 0}(u)=\left(u,u^{2},u^{3}\right)$ and $b^{\pm}\in\mathds{R}^{6}$ denoting the first/second/third left (right) derivatives. As shown in Appendix C.5, when a uniform kernel and symmetric bandwidths are used, the constrained estimators have the same asymptotic distributions as the unconstrained ones, thus our previous results still hold under the constrained estimates.

Appendix B Additional Practical Considerations

In order to compute the uniform consistent conditional density $f_{Y^{d}|VX}(\cdot|h(0),0)$ in Appendix B.1, and $\mu_{1}(x,y,d)\mathbbm{1}\{|x/h_{n}|\leq 1\}$ and $\mu_{2}(x,d)\mathbbm{1}\{|x/h_{n}|\leq 1\}$ in Appendix B.2, we continue to use the local cubic kernel models so the single MSE-optimal bandwidth from the local quadratic regression can be used throughout.

B.1 A Conditional Density Estimator

The statement of Theorem 2 presumes that the densities $f_{Y^{d}|VX}(\cdot|h(0),0)$ are unknown. In order to simulate the multiplier process, we need to replace them by their uniformly consistent estimators. Note that the identifying formulas in Theorem 1 suggest

[TABLE]

Equation (3.3) gives uniformly consistent estimators for the two terms in the denominator. The two terms in the numerator can be written as

[TABLE]

With the bandwidth parameter $b_{n}$ , we represent $\frac{\partial}{\partial y}\mu_{1}(0^{\pm},y,d)$ by the limit of the regularized approximation

[TABLE]

and we estimate it by the local cubic polynomial regression

[TABLE]

This estimate $\tilde{\mu}^{\prime}(0^{\pm},y,d)$ is used for (B.1). Therefore, $\widehat{f}_{Y^{d}|VX}(y|h(0),0)$ is estimated by

[TABLE]

We make the following assumption about the bandwidth parameters $a_{n}$ and $b_{n}$ .

Assumption 3.

The bandwidth parameters $a_{n}$ and $b_{n}$ satisfy $a_{n}\rightarrow 0$ , $b_{n}\rightarrow 0$ , $na_{n}\rightarrow\infty$ and $na_{n}^{2}b_{n}^{2}\rightarrow\infty$ and $\frac{b_{n}}{a_{n}}\to 0$ .

The following lemma shows that the first order derivative of the kernel regularization (B.2) with respect to $x$ are equivalent to the objects (B.1) of interest. We may thus use the estimates of $\frac{\partial}{\partial x}\mu(0^{\pm},y,d)$ to approximate $\frac{\partial}{\partial y}\mu^{\prime}_{1}(0^{\pm},y,d)$ .

Lemma 1.

Let Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3 hold. For each $(y,d,x)\in\mathscr{Y}\times\mathscr{D}\times([\underline{x},\overline{x}]\setminus\{0\})$ , $\frac{\partial}{\partial x}\mu(0^{\pm},y,d)=\frac{\partial}{\partial y}\mu^{\prime}_{1}(0^{\pm},y,d)$ .

A proof is provided in Appendix C.3. To show the uniform consistency of $\widehat{f}_{Y^{\cdot}|VX}(\cdot|h(0),0)$ , it suffices to show $\sup_{(y,d)\in\mathscr{Y}\times\mathscr{D}}|\tilde{\mu}^{\prime}(0^{\pm},y,d)-\mu^{\prime}(0^{\pm},y,d)|\underset{x\times\xi}{\overset{p}{\rightarrow}}0$ . The following lemma establishes this point.

Lemma 2.

Under Assumptions 2 (i), (ii) (a) (b), (iv) (a) (b) and 3, it holds that

[TABLE]

A proof is provided in Appendix C.4.

B.2 First Stage Estimators

We will now give some examples of uniformly consistent estimators that satisfy the high-level condition in Assumption 2 (v). First, the density function of $X$ can be estimated by

[TABLE]

This can be shown to be consistent if $c_{n}\to 0$ and $nc_{n}\to\infty$ , $f_{X}$ is three-time differentiable and $\frac{\partial^{2}}{\partial x^{2}}f_{X}(0)<\infty$ – see Theorem 1.1 of Li and Racine (2007).

We now propose first-stage estimators $\tilde{\mu}_{1}(x,y,d)\mathbbm{1}\{|x/h_{n}|\leq 1\}$ and $\tilde{\mu}_{2}(x,d)\mathbbm{1}\{|x/h_{n}|\leq 1\}$ that are used in the EMP. Denote $\delta_{x}^{+}=\mathbbm{1}\{x\geq 0\}$ and $\delta_{x}^{-}=\mathbbm{1}\{x<0\}$ . We reuse the local cubic estimates from equations (3.2) and (3.3) without requiring to solve an additional optimization problem. We define the first-stage estimators by

[TABLE]

where

[TABLE]

The uniform consistency of these first-stage estimators, required as the high-level condition in Assumption 2 (v), follows from Lemma 7 of Chiang, Hsu, and Sasaki (2019), which is applicable under our Assumption 2 (i)–(iv).

B.3 Bandwidths

Another practical consideration is about a rule for selecting bandwidths in finite sample. We propose to start with the MSE-optimal bandwidths for local quadratic kernel smoothers as the bandwidth for our bias-corrected local cubic kernel estimation, and then to apply the rule-of-thumb correction for coverage optimality (Calonico, Cattaneo and Farrell, 2016, 2018). To keep the implementation simple, we use a single bandwidth $h_{n}$ that is based on minimizing the sum of MSEs of $\overline{\mu}_{1}^{\prime}(0^{+},y,1)-\mu_{1}^{\prime}(0^{-},y,1)$ and $\overline{\mu}_{1}^{\prime}(0^{+},y,0)-\mu_{1}^{\prime}(0^{-},y,0)$ , where both $\overline{\mu}_{1}^{\prime}(0^{+},y,1)$ and $\overline{\mu}_{1}^{\prime}(0^{+},y,0)$ are from local quadratic estimation problems. We first introduce short-hand notations. Let $\Psi^{\pm}=\int_{\mathds{R}_{\pm}}r_{3}(u)r_{3}^{\top}(u)K^{2}(u)du$ and $\Lambda^{\pm}=\int_{\mathds{R}_{\pm}}u^{2}r_{3}(u)K(u)du$ .

For the kernel density estimator $\hat{f}_{X}(0)$ , we make use of Silverman’s rule of thumb

[TABLE]

where $\hat{\sigma}_{X}$ is the sample standard deviation of $\{X_{i}\}^{n}_{i=1}$ .

For the main bandwidth $h_{n}$ , we first choose

[TABLE]

where the leading bias and variance terms are given by

[TABLE]

respectively, with $\overline{\overline{\mu}}_{\pm}^{\prime\prime\prime}$ and $\overline{\bar{\sigma}}_{\pm}^{2}$ given by global cubic parametric regressions of $\mu_{1}^{\prime\prime\prime}(x,y,d)\delta_{x}^{\pm}$ and $\sigma^{2}(y,d|x)\delta_{x}^{\pm}$ , respectively, evaluated at $0^{\pm}$ for certain $(y,d)$ .

With the first-stage bandwidth $h_{0,n}$ having been selected, we can solve

[TABLE]

and thus compute our first-stage level estimate

[TABLE]

We next define the variance estimator by

[TABLE]

where $\check{\mu}_{1}(\cdot,y,d)$ is the first stage level estimator given above.

Finally, the main bandwidth selector $h_{n}$ is defined by

[TABLE]

where the leading bias and variance terms are given by

[TABLE]

In the end, following Calonico, Cattaneo and Farrell (2016, 2018), we can apply the rule-of-thumb (ROT) correction for coverage optimality bandwidth of the local quadratic regression to the main bandwidth as $h^{ROT}_{n}=n^{-2/35}h_{n}.$

For the bandwidth parameters $a_{n}$ and $b_{n}$ used for the conditional density estimator $\widehat{f}_{Y^{d}|VX}(y|h(0),0)$ in Appendix B.1, we follow the choice rules proposed in the end of Appendix C in Frandsen, Frölich, and Melly (2012), and propose to set $a_{n}=h_{n}$ and $b_{n}=h^{2}_{n}$ .

Appendix C Auxiliary Lemmas and Proofs

C.1 Auxiliary Lemmas

C.1.1 Uniform Bahadur Representation

The following lemma proposes the uniform BR for the local slope estimators.

Lemma 3 (Chiang, Hsu, and Sasaki (2019); Lemma 1).

Under Assumption 2, we have the uniform influence function representations (3.4) and (3.5) that hold uniformly on $\mathscr{Y}_{1}\times\mathscr{D}$ .

C.1.2 Functional Central Limit Theorem

Lemma 4.

Let triangular array of separable stochastic processes $\{f_{ni}(\omega,t):i=1,...n,t\in T\}$ be row independent and write $X_{n}(t)=\sum_{i=1}^{n}[f_{ni}(\omega,t)-Ef_{ni}(\cdot,t)]$ , and denote $E^{\ast}$ to be the outer integral (see, e.g., Section 1.2 of van der Vaart and Wellner (1996)). Suppose that the following conditions are satisfied:

$\left\{f_{ni}\right\}$ * are manageable, with envelope $\left\{F_{ni}\right\}$ which are also independent within rows;* 2. 2.

$H(s,t)=\lim_{n\rightarrow\infty}EX_{n}(s)X_{n}(t)$ * exists for every $s,t\in T$ ;* 3. 3.

$\limsup_{n\to\infty}\sum_{i=1}^{n}E^{*}F^{2}_{ni}<\infty$ ; 4. 4.

$\lim_{n\to\infty}\sum_{i=1}^{n}E^{*}F^{2}_{ni}\mathbbm{1}\{F_{ni}>\epsilon\}=0$ * for each $\epsilon>0$ ;* 5. 5.

$\rho(s,t)=\lim_{n\rightarrow\infty}\rho_{n}(s,t),$ * where $\rho_{n}(s,t)=(\sum_{i=1}^{n}E[f_{ni}(\cdot,s)-f_{ni}(\cdot,t)]^{2})^{1/2},$ exists for every $s,t\in T$ , and for all deterministic sequences $\{s_{n}\}$ and $\{t_{n}\}$ in $\mathds{T}$ , if $\rho(s_{n},t_{n})\rightarrow 0$ then $\rho_{n}(s_{n},t_{n})\rightarrow 0$ .*

Then $T$ is totally bounded under the $\rho$ pseudometric, and $X_{n}$ converges weakly to a tight mean zero Gaussian process $\mathds{X}$ concentrated on $\left\{z\in l^{\infty}\left(T\right):z\text{ is uniformly }\rho-\text{continuous}\right\}$ , with covariance $H(s,t)$ .

C.2 Proof of Theorem 2

Before starting to present a proof of the theorem, we introduce additional definitions and notations for the proof of the theorem. Let $\mathcal{F}$ be a class of measurable functions defined on $(\Omega,\mathscr{F})$ with a measurable envelope $F$ . We say that $\mathcal{F}$ is of VC type with envelope $F$ if there exist constants $A$ , $v>0$ such that $\sup_{Q}N(\mathcal{F},L^{2}(Q),\varepsilon\left\|F\right\|_{Q,2})$ $\leq$ $(A/\varepsilon)^{v}$ , where the supremum is taken over the set of all finite discrete measures $Q$ on $\mathcal{F}$ .

To approximate the distribution of the BR, we define the following Multiplier Processes (MP):

[TABLE]

For ease of writing, we use the following notations for the differences of right and left limits of the BR, the MP, and the EMP with $k=1,2$ :

[TABLE]

With these preparations, we now start a proof of Theorem 2.

Part (i) (a): We will verify the five conditions in Lemma 4 for the triangular array of stochastic processes $\{f_{ni}\}$ defined by

[TABLE]

The separability follows the same argument as in the proof of Theorem 4 of Kosorok (2003) and the left or right continuity of the processes. To show condition 1, define

[TABLE]

We first claim that $\mathscr{F}_{n}^{+}$ is a VC type class with envelope

[TABLE]

for some constant $C^{\prime\prime}>0$ . It is clear $\{(y^{\ast},d^{\ast},x^{\ast})\mapsto\mathbbm{1}\{y^{\ast}\leq y\}:y\in\mathscr{Y}_{1}\}$ is of VC-subgraph with VC index $\leq 2,$ since it is monotone increasing in $y$ , and thus for each pair $(y_{1}^{\ast},x_{1}^{\ast},d_{1}^{\ast},r_{1}),(y_{2}^{\ast},x_{2}^{\ast},d_{2}^{\ast},r_{2})\in\mathscr{Y}_{1}\times\mathscr{X}\times\{0,1\}\times\mathds{R}$ with $y_{1}^{\ast}\leq y_{2}^{\ast}$ , it can never pick out $\{(y_{2}^{\ast},x_{2}^{\ast},d_{2}^{\ast},r_{2})\}$ . Similarly, $\{(y^{\ast},d^{\ast},x^{\ast})\mapsto\mathbbm{1}\{d^{\ast}=d\}:d\in\{1,2\}\}$ , $\{(y^{\ast},d^{\ast},x^{\ast})\mapsto\{\mathbbm{1}\{k^{\ast}=k\}:k\in\{1,2\}\}$ and $\{(y^{\ast},d^{\ast},x^{\ast})\mapsto\mathbbm{1}\{x^{\ast}\geq 0\}\}$ are all VC subgraph classes, since they are sub-collections of all half spaces and then by Lemma 9.12 (i) of Kosorok (2008). Each of them is therefore of VC type with envelope $1$ . Next, Assumption 2(ii)(a)(b) imply

[TABLE]

for an $L>0$ and Euclidean norm $\left\|\cdot\right\|$ . Thus $\{x^{\ast}\mapsto\mu_{k}(x,y,d):(k,y,d)\in\{1,2\}\times\mathscr{Y}_{1}\times\mathscr{D}\}$ is of VC type with envelope $L$ in light of Example 19.7 of van der Vaart (1998) and Lemma 9.18 of Kosorok (2008). Under Assumption 2(i)(b), (iii) and (iv), for each $n$ , the collection of a single function

[TABLE]

is of VC subgraph and therefore VC type with envelope $\frac{C^{\prime}\mathbbm{1}\{|x^{\ast}/h_{n}|\in[-1,1]\}}{\sqrt{nh_{n}}}$ . Example 19.19 of van der Vaart (1998) suggests VC type classes, that are of finite uniform integrals, are closed under element-wise addition and multiplication. Therefore, $\mathscr{F}_{n}$ is of VC type with envelope constant $C^{\prime\prime}$ and thus

[TABLE]

is of VC type with envelope $F_{n}^{+}(y^{\ast},d^{\ast},x^{\ast})=\frac{C^{\prime\prime}}{\sqrt{nh_{n}}}\left\|K\right\|_{\infty}\mathbbm{1}\{x^{\ast}/h_{n}\in[-1,1]\}$ . Finally, standard calculations show for each $n$ and for any $\delta\in(0,1)$ the uniform entropy integral bound

[TABLE]

Equation (A.1) in the proof of Theorem 1 in Andrews (1994) then implies that $\mathscr{F}_{n}^{+}$ is a manageable class of functions, as defined in Section 11.4.1 of Kosorok (2008). To check condition 2, notice

[TABLE]

It suffices to check $\sum_{i=1}^{n}Ef_{ni}(y_{1},d_{1},k_{1})f_{ni}(y_{2},d_{2},k_{2})<\infty$ since $Ef_{ni}(y,d,k)=0$ due to the law of iterated expectations, and thus the second term is [math]. When $k_{1}=k_{2}=1$ , under Assumption 2(i)(a)(b),(ii)(c),(iii), (iv)(a),

[TABLE]

where the second to the last equality is due to mean value expansions under Assumption 2 (i)(b) and (ii)(c). Notice that $n$ enters only through the $O(h_{n})$ term, and thus

[TABLE]

exists. Similar calculations hold for $k_{1}=k_{2}=1$ and $k_{1}=1$ , $k_{2}=2$ . This shows condition 2. Condition 3 is clear since

[TABLE]

under Assumption 2 (i)(a), (iii) and (iv)(a). To show condition 4, note that for each $\varepsilon>0$ ,

[TABLE]

under Assumption 2 (i)(a), (iii) and (iv)(a). This shows condition 4. To show condition 5, note that we can write

[TABLE]

From our calculations on the way to show condition 2, we know that each term on the right-hand side exists under Assumption 2 (i)(a)(b),(ii)(c),(iii), (iv)(a). Since $n$ enters the expression only through the $O(h_{n})$ part, for all deterministic sequences $s_{n}\in\mathscr{Y}_{1}\times\{0,1\}\times\{1,2\}$ and $t_{n}\in\mathscr{Y}_{1}\times\{0,1\}\times\{1,2\}$ , $\rho^{2}(s_{n},t_{n})\rightarrow 0$ implies $\rho_{n}^{2}(s_{n},t_{n})\rightarrow 0.$ By Lemma 4, we have $\nu_{n}^{+}\leadsto\mathds{G}_{+}$ and similarly for $\nu_{n}^{-}\leadsto\mathds{G}_{-}$ . Assumption 2(i)(a) then implies $\nu_{n}=\nu_{n}^{+}-\nu_{n}^{-}\leadsto\mathds{G}:=\mathds{G}_{+}-\mathds{G}_{-}$ .

Part (i) (b): We apply the FCLT and the functional delta method. Notice that $\nu_{n}\leadsto\mathds{G}$ suggests

[TABLE]

Let $(A(\cdot),B(\cdot))\in\ell^{\infty}(\mathscr{Y}_{1}\times\{0,1\})\times\ell^{\infty}(\mathscr{Y}_{1})$ , if $B(\cdot)>C>0$ , then $(G,H)\overset{\Psi}{\mapsto}{G}/{H}$ is Hadamard differentiable at $(A,B)$ tangentially to $\ell^{\infty}$ with the Hadamard derivative $\Psi_{(A,B)}^{\prime}$ given by $\Psi_{(A,B)}^{\prime}(g,h)={(Bg-Ah)}/{B^{2}}$ . Therefore, under Assumption 1(ii), we know that $\mu_{2}^{\prime}(0^{+},d)-\mu_{2}^{\prime}(0^{-},d)$ is bounded away from [math]. Also, $f_{Y^{d}|VX}(\cdot|h(0),0)$ is bounded away from zero under Assumption 2(i)(c). The functional delta method then yields

[TABLE]

where

[TABLE]

Part (i) (c): Define operator $\Upsilon:\mathds{D}_{\Upsilon}(\mathscr{Y}_{1}\times\{0,1\})\rightarrow\ell^{\infty}([a,1-a])$ as

[TABLE]

where $\Phi(F)(\theta)=Q(\theta)=\inf\{y\in\mathscr{Y}_{1}:F(y)\geq\theta\}$ . By Hadamard differentiability from Lemma 3.9.23(ii) of van der Vaart and Wellner (1996) and the chain rule (van der Vaart, 1998, Theorem 20.9), under Assumption 2(i)(c),(ii)(a)(b), $\Upsilon$ is Hadamard differentiable at $F_{Y^{\cdot}|VX}(\cdot|h(0),0)$ tangentially to $\mathcal{C}(\mathscr{Y}_{1}\times\mathscr{D})$ and the derivative (Kosorok, 2008, Section 2.2.4) is

[TABLE]

is tangential to $C(\mathscr{Y}_{1}\times\mathscr{D})$ . The functional delta method then yields

[TABLE]

where

[TABLE]

Part (ii): This part of the proof consists of two steps. We first show the convergence result for the EMP, and then show the convergence result for $\widehat{\Xi}\left(\cdot\right)$ .

Step 1 We claim $\nu_{\widehat{\xi},n}\underset{\xi}{\overset{p}{\leadsto}}\mathds{G}$ . Applying Theorem 11.19 of Kosorok (2008), which is applicable under the five conditions verified in (i), we have $\nu_{\xi,n}=\nu_{\xi,n}^{+}-\nu_{\xi,n}^{-}\underset{\xi}{\overset{p}{\leadsto}}\mathds{G}$ . In light of of Lemma 2 of Chiang, Hsu, and Sasaki (2019), it then suffices to show

[TABLE]

Indeed, for $k=1$ , by Assumption 2(i)(b),(v), we have

[TABLE]

where $T_{i}^{+}=\xi_{i}\frac{e_{1}^{\top}\left(\Gamma^{+}\right)^{-1}r_{3}(\frac{X_{i}}{h_{n}})K(\frac{X_{i}}{h_{n}})\delta_{i}^{+}}{\sqrt{nh_{n}}}$ . It can be shown that the array of zero mean random variables $\{\sum_{i=1}^{n}T_{i}^{+}\}_{i=1}^{n}$ satisfies Lindeberg-Feller conditions (Proposition 2.27 of van der Vaart (1998)) under Assumption 2(i)(a), (iii) and (iv)(a)(c) and therefore converges in distribution to a normal distribution. Therefore, the asymptotic tightness then implies $\sum_{i=1}^{n}T_{i}=O_{p}^{x\times\xi}(1)$ . Thus we conclude that equation C.1 is $o_{p}^{x\times\xi}(1)$ .

Step 2 We will show

[TABLE]

where

[TABLE]

We first use Theorem 12.1 of Kosorok (2008) (the functional delta for bootstrap) along with the conclusion of Step 1 to get

[TABLE]

Since the denominator is bounded away from [math] under Assumption 2(i)(iv), uniform consistency of $\hat{\mu}_{1}^{{}^{\prime}}$ , $\hat{\mu}_{2}^{{}^{\prime}}$ from Theorem 2 gives $\left\|\tilde{Z}_{\xi,n}-\hat{Z}_{\xi,n}\right\|_{\mathscr{Y}_{1}\times\{0,1\}}\underset{x\times\xi}{\overset{p}{\rightarrow}}0$ , and Lemma 2 of Chiang, Hsu, and Sasaki (2019) implies $\hat{Z}_{\xi,n}\underset{\xi}{\overset{p}{\leadsto}}\mathds{G}_{F}$ . Using the functional delta method for bootstrap again, we obtain

[TABLE]

Since $f_{Y^{d}|VX}(\cdot|h(0),0)$ are bounded away from zero, using asymptotic $\rho-$ equicontinuity of $\hat{Z}_{\xi,n}(\cdot,\cdot)$ following its (conditional) weak convergence and Theorem 3.7.23 of Giné and Nickl (2016), and the uniform consistency of $\hat{f}_{Y^{d}|VX}(\cdot|h(0),0)$ and $\hat{Q}_{Y^{d}|VX}(\cdot)$ with $d=1,2$ along with Lemma 2 of Chiang, Hsu, and Sasaki (2019), we conclude part (ii) of the theorem.

C.3 Proof of Lemma 1

We prove the lemma by two steps: for each $(y,d,x)\in\mathscr{Y}\times\mathscr{D}\times([\underline{x},\overline{x}]\setminus\{0\})$ , Step 1 shows

[TABLE]

and Step 2 shows

[TABLE]

Step 1 For $d=1$ , under Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3, for each $(y,x)\in\mathscr{Y}\times([\underline{x},\overline{x}]\setminus\{0\})$ , for $d=1$ , applying the dominated convergence theorem, we have

[TABLE]

where $y\ast$ lies between $y$ and $y+ub_{n}$ . Similar result holds for $d=0$ .

Step 2 Under Assumptions 2 (i) (b), (ii) (a) (b), (iv) (a) and 3, for each $(y,x)\in\mathscr{Y}\times([\underline{x},\overline{x}]\setminus\{0\})$ , for $d=1$ , an application of the dominated convergence theorem yields

[TABLE]

C.4 Proof of Lemma 2

The proof makes use of a maximal inequality from Chernozhukov, V., Chetverikov, D., & Kato, K. (2014). Under Assumptions 2 (ii) (a) (b) and 3, as in Section 1.6 of Tsybakov (2008), the solution to equation (B.3) can be written as

[TABLE]

where $\alpha(0^{+},y,d)=\Big{[}\mu(0^{\pm},y,d),\mu^{\prime}(0^{\pm},y,d)a_{n},\mu^{\prime\prime}(0^{\pm},y,d)a_{n}^{2}/2!,\mu^{\prime\prime\prime}(0^{\pm},y,d)a_{n}^{3}/3!\Big{]}^{\top}$ . Multiply both sides by $e_{1}^{\top}$ to get

[TABLE]

where

[TABLE]

From Step 1 of Proof of Lemma 1 in Chiang, Hsu, and Sasaki (2019), with Assumption 2 (i) (a) (b), (iii) and (iv) and 3, we have the common inverse factor

[TABLE]

uniformly in $(y,d)$ . It suffices to show that each of

[TABLE]

converges in probability to zero uniformly. We will divide the argument into the following four steps.

Step 1 Under Assumption 2 (i)(a), (ii)(a)(b), (iii) and (iv)(a), it holds that

[TABLE]

Step 2 We first bound the difference

[TABLE]

It suffices to show that each term converges in probability uniformly. Define for each $t=0,1,...,3$

[TABLE]

where $\delta_{x}^{+}=\mathbbm{1}\{x\geq 0\}$ and $\delta_{x}^{-}=\mathbbm{1}\{x<0\}$ . Note that for a fixed $t$ , $\mathscr{F}_{t,n}\subset\mathscr{F}_{t}$ for all $n$ . Fix any $t$ , under Assumption 2 (iv), $\{x^{\ast}\mapsto K(ax^{\ast}):a\in\mathds{R}\}$ is of VC Type class with measurable envelope $\left\|K\right\|_{\infty}$ . By Proposition 3.6.12 of Giné and Nickl (2016), $x\mapsto(ax)^{t}\mathds{1}\{ax\leq 1\}$ is of VC type class with measurable envelope $1$ since $z\mapsto z^{t}\mathds{1}\{z\leq 1\}$ is a mapping of bounded variations. Furthermore, $\{\mathds{1}\{d^{\ast}=d\}:d\in\mathscr{D}\}$ is of VC-subgraph class and therefore of VC type. Lemma A.6 of Chernozhukov, V., Chetverikov, D., & Kato, K. (2014) then implies that the class of their element-wise product $\mathscr{F}_{t}$ is of VC type with envelope $F_{t}=\left\|K\right\|_{\infty}^{2}$ , i.e., there exist positive constants $k$ , $v<\infty$ such that $\sup_{Q}N(\mathscr{F}_{t},\left\|\cdot\right\|_{Q,2},\varepsilon\left\|F_{t}\right\|_{Q,2})\leq(\frac{k}{\varepsilon})^{v}$ for $0<\varepsilon\leq 1$ and the supremum is taken over the set of all probability measures on $(\Omega^{x},\mathcal{F}^{x})$ . Corollary 5.1 in Chernozhukov, V., Chetverikov, D., & Kato, K. (2014) then gives

[TABLE]

Multiplying both sides by $(\sqrt{n}a_{n}b_{n})^{-1}$ , we have

[TABLE]

The result then follows from Markov’s inequality and Assumption 3.

Step 3 We now want to control

[TABLE]

Since under Assumption 2 (ii)(a)(b), for any $(y_{1},d_{1})$ , $(y_{2},d_{2})\in\mathscr{Y}\times\mathscr{D}$ , $|\mu(x,y_{1},d_{1})-\mu(x,y_{2},d_{2})|\leq M(x)(|y_{1}-y_{2}|+|d_{1}-d_{2}|)$ , this implies that $\{\mu(\cdot,y,d):y\in\mathscr{Y}_{1},d\in\mathscr{D}\}$ is of VC type class in lieu of Example 19.7 of van der Vaart (1998) and Lemma 9.18 of Kosorok (2008). We can then follow the same steps as in Step 2 to show

[TABLE]

The desired result of the current step then follows from Markov’s inequality and Assumption 3.

Step 4 Finally, we show that the two expectations above are asymptotically equivalent uniformly in $y$ and $d$ . Under Assumption 2 (i) (b), (ii) (a) (b), (iii), (iv) (a), calculations yield

[TABLE]

by the law of iterated expectations under Assumption 3. This result, along with results from Steps 2 and 3, concludes the proof. ∎

C.5 On Remark 2

This appendix section proves the statement in Remark 2. We mostly follow the proof of Proposition 6 of Card, Lee, Pei, and Weber (2015). Let $\mathds{1}\{\mathbf{Y}\leq y\}\mathds{1}\{\mathbf{D}=d\}$ be the “stacked” $n\times 1$ outcome variable $\left\{\mathds{1}\{Y_{i}\leq y\}\mathds{1}\{D_{i}=d\}\right\}_{i=1}^{n}$ , where the first $n^{-}$ entries are observations to the left of $x_{0}$ and the last $n^{+}$ entries are those to the right of $x_{0}$ . Let $\mathbf{Z}$ be the $n\times 8$ matrix whose $i^{\text{th}}$ row is

[TABLE]

Also let

[TABLE]

with $\mathbf{W}_{K}^{\pm}$ being the diagonal matrices

[TABLE]

The constrained estimator can be obtained from

[TABLE]

subject to $\mathbf{R}\beta^{R}=0$ where $\mathbf{R=}\left(1,0,0,0,-1,0,0,0\right)$ . Denote the resulting estimator by

[TABLE]

From equation (1.4.5) of Amemiya (1985), we have

[TABLE]

where the first term on the RHS is the unconstrained version and $\Pi^{-1}$ is

[TABLE]

Since $\hat{\mu}_{1}^{\prime R}(0^{+},y,d)h_{n}-\hat{\mu}_{1}^{\prime R}(0^{-},y,d)h_{n}=\mathbf{E}\widehat{\beta}^{R}$ , where $\mathbf{E=}\left(0,1,0,0,0,-1,0,0\right)$ and $K$ is the uniform kernel, we have $\mathbf{E\cdot}\Pi^{-1}\cdot\mathbf{R}^{\top}=0$ . Therefore,

[TABLE]

where the constrained estimator has the same asymptotic distribution as the unconstrained one. ∎

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadie (2002) Abadie, A. (2002) “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models,” Journal of the American Statistical Association, Vol. 97, No. 457, pp. 284–292.
2Amemiya (1985) Amemiya, T. (1985) Advanced Econometrics. Harvard University Press.
3Björklund and Moffitt (1987) Björklund, A. and Moffitt, R. (1987) “The Estimation of Wage and Welfare Gains in Self-Selection Models.” Review of Economics and Statistics, Vol. 69, No. 1, pp. 42–49.
4Calonico, Cattaneo and Farrell (2016) Calonico, S., Cattaneo, M.D., and Farrell, M. (2016) “Coverage Error Optimal Confidence Intervals for Regression Discontinuity Designs,” Working paper.
5Calonico, Cattaneo and Farrell (2018) Calonico, S., Cattaneo, M.D., and Farrell, M. (2018) “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference,” Journal of the American Statistical Association, Vol. 113, No. 522, pp. 767–779.
6Calonico, Cattaneo and Titiunik (2014) Calonico, S., Cattaneo, M.D., and Titiunik, R. (2014) “Robust Nonparametric Confidence Intervals for Regression Discontinuity Designs,” Econometrica, Vol. 82, No. 6, pp. 2295–2326.
7Card, Lee, Pei, and Weber (2015) Card, D., Lee, D., Pei, Z., and Weber, A. (2015) “Inference on Causal Effects in a Generalized Regression Kink Design,” Econometrica, Vol. 83, No. 6, pp. 2453–2483.
8Cerulli, Dong, Lewbel, and Poulsen (2017) Cerulli, G., Dong, Y., Lewbel, A., and Poulsen, A. (2017) “Testing Stability of Regression Discontinuity Models,” in Advances in Econometrics, Vol. 38: Regression Discontinuity Designs: Theory and Applications, M.D. Cattaneo and J.C. Escanciano, eds., pp. 317–339.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quantile Treatment Effects in Regression Kink Designs††thanks: First arXiv date: March 15, 2017.

Abstract

1 Introduction

2 Identification: the Main Result

Assumption 1** (Regression Kink Design, RKD).**

Theorem 1** (Identification).**

3 Estimation and Inference: a Practical Guideline

4 Summary

Mathematical Appendix

Appendix A Estimation and Inference: Formal Theory

Assumption 2**.**

Theorem 2** (Asymptotic Distributions and Bootstrap Validity).**

Remark 1**.**

Remark 2**.**

Appendix B Additional Practical Considerations

B.1 A Conditional Density Estimator

Assumption 3**.**

Lemma 1**.**

Lemma 2**.**

B.2 First Stage Estimators

B.3 Bandwidths

Appendix C Auxiliary Lemmas and Proofs

C.1 Auxiliary Lemmas

C.1.1 Uniform Bahadur Representation

Lemma 3** (Chiang, Hsu, and Sasaki (2019); Lemma 1).**

C.1.2 Functional Central Limit Theorem

Lemma 4**.**

C.2 Proof of Theorem 2

C.3 Proof of Lemma 1

C.4 Proof of Lemma 2

C.5 On Remark 2

Assumption 1 (Regression Kink Design, RKD).

Theorem 1 (Identification).

Assumption 2.

Theorem 2 (Asymptotic Distributions and Bootstrap Validity).

Remark 1.

Remark 2.

Assumption 3.

Lemma 1.

Lemma 2.

Lemma 3 (Chiang, Hsu, and Sasaki (2019); Lemma 1).

Lemma 4.