Sharp Bounds for the Marginal Treatment Effect with Sample Selection
Vitor Possebom

TL;DR
This paper develops sharp bounds for the marginal treatment effect in the presence of sample selection and endogenous treatment choice, providing new identification strategies and empirical estimates for a real-world program.
Contribution
It introduces pointwise sharp bounds for the MTE under monotonicity and mean dominance assumptions, extending identification to both continuous and discrete propensity scores.
Findings
Bounds for MTE are estimated for the Job Corps program.
MTE decreases with higher likelihood of attending the program.
Estimated ATE on the treated ranges from 0.33 to 0.99 dollars.
Abstract
I analyze treatment effects in situations when agents endogenously select into the treatment group and into the observed sample. As a theoretical contribution, I propose pointwise sharp bounds for the marginal treatment effect (MTE) of interest within the always-observed subpopulation under monotonicity assumptions. Moreover, I impose an extra mean dominance assumption to tighten the previous bounds. I further discuss how to identify those bounds when the support of the propensity score is either continuous or discrete. Using these results, I estimate bounds for the MTE of the Job Corps Training Program on hourly wages for the always-employed subpopulation and find that it is decreasing in the likelihood of attending the program within the Non-Hispanic group. For example, the Average Treatment Effect on the Treated is between $.33 and $.99 while the Average Treatment Effect on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Labor market dynamics and wage inequality · Economic Policies and Impacts
Sharp Bounds for the Marginal Treatment Effect with Sample Selection††thanks: I would like to thank Joseph Altonji, Nathan Barker, Ivan Canay, Xuan Chen, John Finlay, Carlos A. Flores, John Eric Humphries, Yuichi Kitamura, Marianne Köhli, Helena Laneuville, Jaewon Lee, Yusuke Narita, Pedro Sant’anna, Masayuki Sawada, Azeem Shaikh, Edward Vytlacil, Stephanie Weber, Siuyuat Wong and seminar participants at Yale University for helpful suggestions.
Vitor Possebom
Yale University Email: [email protected]
( )
First Draft: October 2018
This Draft: April 2019
Please click here for the most recent version
Abstract
I analyze treatment effects in situations when agents endogenously select into the treatment group and into the observed sample. As a theoretical contribution, I propose pointwise sharp bounds for the marginal treatment effect (MTE) of interest within the always-observed subpopulation under monotonicity assumptions. Moreover, I impose an extra mean dominance assumption to tighten the previous bounds. I further discuss how to identify those bounds when the support of the propensity score is either continuous or discrete. Using these results, I estimate bounds for the MTE of the Job Corps Training Program on hourly wages for the always-employed subpopulation and find that it is decreasing in the likelihood of attending the program within the Non-Hispanic group. For example, the Average Treatment Effect on the Treated is between .99 while the Average Treatment Effect on the Untreated is between 3.00.
Keywords: Marginal Treatment Effect, Sample Selection, Partial Identification, Principal Stratification, Program Evaluation, Training Programs.
JEL Codes: C31, C35, C36, J38
Introduction
In the applied treatment effects literature, there are many problems that face two identification challenges: endogenous selection into treatment and endogenous sample selection. For instance, in Labor Economics, if a researcher wants to evaluate the effect of a job training program on wages, she has to understand why agents choose to enroll in the program and why agents select into her sample by being employed. In this situation, she may combine information on hourly labor earnings (the observable outcome) and employment (sample selection status) to uncover the effect on hourly wages (the outcome of interest). Similar problems appear in Labor Economics when analyzing the college wage premium and scarring effects. In the Health Sciences, a researcher faces the same identification challenges when analyzing the effect of a drug on a health quality index when the drug may save a patient’s life. Moreover, in randomized control trials, researchers are concerned with non-compliance and differential attrition rates between treated and control groups. This double selection problem is also present when analyzing the effect of an educational intervention on short- and long-term outcomes and the effect of procedural laws on litigation outcomes.111Training programs are studied by Heckman et al. (1999), Lee (2009) and Chen & Flores (2015). The college wage premium is analyzed by Altonji (1993), Card (1999) and Carneiro et al. (2011). Scarring effects are discussed by Heckman & Borjas (1980), Farber (1993) and Jacobson et al. (1993). Some education interventions are studied by Krueger & Whitmore (2001), Angrist et al. (2006), Angrist et al. (2009), Chetty et al. (2011) and Dobbie & Jr. (2015). Medical treatments are analyzed by CASS (1984), Sexton & Hebel (1984) and U.S. Department of Health and Human Services (2004). Litigation outcomes are discussed by Helland & Yoon (2017). RCT with attrition are illustrated by DeMel et al. (2013) and Angelucci et al. (2015).
To simultaneously address both idetification challenges, I propose a Generalized Roy Model (Heckman & Vytlacil (1999)) with sample selection in which there is one outcome of interest that is observed only if the individual self-selects into the sample. Under a monotonicity assumption on the sample selection indicator, I decompose the Marginal Treatment Response (MTR) function for the potential observable outcome when treated as a weighted average of (i) the MTR on the outcome of interest for the subpopulation who is always observed and (ii) the Marginal Treatment Effect (MTE) on the observable outcome for the subpopulation who is observed only when treated. Under a bounded (in one direction) support condition, this decomposition is useful because it allows me to propose pointwise sharp bounds for the MTE on the outcome of interest within the always-observed subpopulation () as a function of (i) the MTR functions on the observable outcome, (ii) the maximum and (or) minimum of the support of the potential outcome, and (iii) the proportions of always-observed individuals and observed-only-when-treated individuals. I also show that it is impossible to construct bounds without extra assumptions when the support of the potential outcome is the entire real line. After that, I impose an extra mean dominance assumption that compares the always-observed population against the observed-only-when-treated population, tightening the previous bounds. Moreover, under this new assumption, I show that those tighter bounds are also pointwise sharp and derive an informative lower bound even when the support of the potential outcome is the entire real line.
I then proceed to show that those bounds are well-identified. When the support of the propensity score is an interval, the relevant objects are point identified by applying the local instrumental variable approach (LIV, see Heckman & Vytlacil (1999)) to the expectations of the observable outcome and of the selection indicator conditional on the propensity score and the treatment status. However, in many empirical applications, the support of the propensity score is a finite set. In such a context, I can identify bounds for the of interest by adapting the nonparametric bounds proposed by Mogstad et al. (2018) or the flexible parametric approach suggested by Brinch et al. (2017) to encompass a sample selection problem. When using the nonparametric approach, the bounds for the of interest are simply an outer set that contains the true , i.e., they are not pointwise sharp anymore.
Partial identification of the of interest is useful for two reasons. First and most importantly, bounds for the can be used to shed light on the heterogeneity of treatment effects, allowing the researcher to understand who would benefit and who would lose from a specific treatment, as recently illustrated by Cornelissen et al. (2018) and Bhuller et al. (2019). This knowledge can be used to optimally design policies that incentivize to agents to take a treatment. Second, bounds for the can be used to construct bounds for any treatment effect parameter that is written as a weighted integral of the . For instance, by taking a weighted average of the pointwise sharp bounds for the , one can bound the average treatment effect (ATE), the average treatment effect on the treated (ATT), any local average treatment effect (LATE, Imbens & Angrist (1994)) and any policy-relevant treatment effect (PRTE, Heckman & Vytlacil (2001b)) within the always-observed subpopulation. Although such bounds may not be sharp for any specific parameter, they are a flexible and easy-to-apply tool for many empirical problems that depend on a varied set of treatment effects.222As a consequence of this trade-off between flexibility and sharpness, I recommend the use of a specialized tool if the parameter of interest already has specific bounds (e.g., the ITT by Lee (2009) and the LATE by Chen & Flores (2015)).
Finally, I illustrate the usefulness of the proposed bounds for the of interest by analyzing the effect of the Job Corps Training Program (JCTP) on hourly wages within the Non-Hispanic always-employed subpopulation. My framework is ideal to analyze this important experiment because it simultaneously addresses the imperfect compliance issue (self-selection into treatment) by focusing on the MTE and the endogenous employment decision (sample selection) by using a partial identification strategy. Although my bounds are uninformative when using only the monotonicity assumption, they are tight and positive under a mean dominance assumption, illustrating the identification power of extra assumptions in a context of partial identification. Most interestingly, I find that the bounds of the on hourly wages are decreasing in the likelihood of attending the program, implying that the agents who would benefit the most from the JCTP are the least likely to attend it. As a consequence of this result, my estimates suggest that ATU is greater than the ATT for the always-employed subpopulation. Moreover, my bounds for the are in line with the estimates of Chen & Flores (2015) and the effect of the JCTP on employment is positive for every agent according to the test proposed by Machado et al. (2018). Finally, as a by-product of my estimation strategy, I also find that the MTE on employment and hourly labor earnings are decreasing in the likelihood of attending the JCTP, a result that is in line with the estimated upper bounds of Chen et al. (2017).
I make contributions to three branches of literature: identification of treatment effects using an instrument, identification of treatment effects with sample selection, and the effect of job training programs. They are all vast and only briefly summarized here.
In the literature about treatment effects with an instrument, Imbens & Angrist (1994) show that we can identify the LATE. Heckman & Vytlacil (1999), Heckman & Vytlacil (2005) and Heckman et al. (2006) define the MTE and explain how to compute any treatment effect as a weighted average of the MTE. However, if the support of the propensity score is not the unit interval, then it is not possible to non-parametrically identify some common treatment effects, such as the ATE, the ATT and the ATU. A parametric solution to this problem is given by Brinch et al. (2017), who identify a flexible polynomial function for the MTE whose degree is defined by the cardinality of the propensity score support, while a nonparametric solution is given by Mogstad et al. (2018), who use the information contained on IV-like estimands to construct non-parametrically worst- and best- case bounds for policy-relevant treatment effects.333Other important contributions are made by Manski (1990), Manski (1997), Manski & Pepper (2000), Heckman & Vytlacil (2001a), Bhattacharya et al. (2008), Chesher (2010), Chiburis (2010), Shaikh & Vytlacil (2011), Bhattacharya et al. (2012), Cornelissen et al. (2016), Chen et al. (2017), Huber et al. (2017), Kowalski (2018), Mourifie et al. (2018) and Zhou & Xie (2019).
I contribute to this literature by extending the non-parametric approach by Mogstad et al. (2018) and the flexible parametric approach by Brinch et al. (2017) to encompass a sample selection problem. By doing so, I can partially identify the MTE function on the outcome of interest, which, in my framework, is different from the observable outcome.
In the literature about identification of treatment effects with sample selection, the control function approach (Heckman (1979), Ahn & Powell (1993) and Das et al. (2003)) and the use of auxiliary data (Chen et al. (2008)) are two classical solutions to this problem. Another approach is to partially identify the parameter of interest by imposing weak monotonicity assumptions. For example, in a seminal paper, Lee (2009) imposes that sample selection is monotone on treatment assignment to sharply bound the ITT for the subpopulation of always-observed individuals ().444Other relevant contributions are made by Frangakis & Rubin (2002), Blundell et al. (2007), Imai (2008), Lechner & Mell (2010), Blanco et al. (2013a), Mealli & Pacini (2013), Behaghel et al. (2015) and Huber & Mellace (2015).
In the intersection of both literatures, a few authors address the problem of sample selection and endogenous treatment simultaneously. By using two instrumental variables, Fricke et al. (2015) and Lee & Salanie (2016) identify different treatment effects. However, since finding a credible instrument for sample selection is challenging in some cases, it is worth developing alternative tools that do not require more than an instrument for selection into treatment. Frolich & Huber (2014) point identify the LATE by assuming that there is no contemporaneous relationship between the potential outcomes and the sample selection problem. Chen & Flores (2015) derive bounds for Average Treatment Effect within the always-observed compliers () by combining one instrument with a double exclusion restriction with monotonicity assumptions on the sample selection and the selection into treatment problems.555Other important contributions are made by Huber (2014), Steinmayr (2014), Blanco et al. (2017) and Kedagni (2018).
I contribute to this literature by partially identifying the MTE on the always-observed subsample allowing for a contemporaneous relationship between the potential outcomes and the sample selection problem, and using only one (discrete) instrument combined with a monotonicity assumption. Deriving bounds for the is theoretically important because it can unify, in one framework, the bounds for different treatment effects with sample selection. It is also empirically relevant because it allows us to partially identify any treatment effect on the outcome of interest in many empirical problems. For instance, when analyzing the effect of a job training program on wages, it is useful to compare the ATT with the ATU in order to understand whether the workers who would benefit the most from such a policy are actually the ones who receive training.
In the literature about job training programs, Heckman et al. (1999) wrote an influential survey paper. In particular, many papers were written about the effects of the Job Corps Training Program (JCTP) after a randomized experiment funded by the U.S. Department of Labor in 1995.666For example, significant contributions are made by Schochet et al. (2001), Schochet et al. (2008), Flores-Lagunes et al. (2010), Flores et al. (2012), Blanco et al. (2013a), Blanco et al. (2013b), Blanco et al. (2017) and Chen et al. (2017). Finally, my work is closer to the research done by Lee (2009) and Chen & Flores (2015), who analyze the effect of the Job Corps Training Program on wages by focusing, respectively, on the ITT and the LATE parameters within the always-observed subpopulation. Lee (2009) rules out a zero effect after accounting for the loss in labor market experience generated by the extra education acquired by Job Corps participants. Chen & Flores (2015) find that the on hourly wages four years after randomization is between 5.7% and 13.9% for the entire population and between 7.7% and 17.5% for the non-Hispanic population under monotonicity and mean dominance assumptions.
I contribute to this literature by analyzing the MTE on hourly wages within the Non-Hispanic group and formally testing whether this training program has a monotone effect on employment by implementing the test proposed by Machado et al. (2018).
This paper proceeds as follows: Section 2 details the Generalized Roy Model with sample selection; Section 3 explains how to derive bounds for the of interest; Sections 4 and 5 discuss identification of the bounds when the support of the propensity score is continuous or discrete; and Section 6 analyzes the effect of the Job Corps Training Program on hourly wages. Finally, Section 7 concludes.
Framework
I begin with the classical potential outcome framework by Rubin (1974) and modify it to include a sample selection problem. Let be an instrumental variable whose support is given by , be a vector of covariates whose support is given by , be a vector that combines the covariates and the instrument whose support is given by , be a treatment status indicator, be the potential outcome of interest when the person is not treated, and be the potential outcome of interest when the person is treated. The outcome variable of interest (e.g., wages) is . Moreover, let and be potential sample selection indicators when treated and when not treated, and define as the sample selection indicator (e.g., employment status). Define as the observable outcome (e.g., labor earnings). I also define and as the potential observable outcomes. Observe that, following Lee (2009) and Chen & Flores (2015), my notation implicitly imposes two exclusion restrictions: Z has no direct impact on the potential outcome of interest nor on the sample selection indicator. The second exclusion restriction requires attention in empirical applications. On the one hand, it may be a strong assumption in randomized control trials if sample selection is due to attrition and initial assignment has an effect on the subject’s willingness to contact the researchers. On the other hand, it may be a reasonable assumption in many labor market applications, such as the evaluation of a job training program. For instance, in my empirical section, it is plausible that the initial random assignment to the Job Corps Training Program (JCTP) has no impact on future employment status.
I model sample selection and selection into treatment using the Generalized Roy Model (Heckman & Vytlacil, 1999). Let and be random variables, and and be unknown functions. I assume that:
[TABLE]
and
[TABLE]
As Vytlacil (2002) shows, equations (1) and (2) are equivalent to assuming monotonicity conditions on the selection-into-treatment problem (Imbens & Angrist (1994)) and on the sample selection problem (Lee (2009)). I stress that both monotonicity assumptions are testable using the tools developed by Machado et al. (2018). Note also that, given equation (2), and .
The random variables and are jointly continuously distributed conditional on with density and cumulative distribution function . As has been shown in the literature, equations (1) and (2) can be rewritten as
[TABLE]
where , , , and . Consequently, the marginal distributions of and conditional on follow the standard uniform distribution. Since this is merely a normalization, I drop the tilde and mantain throughout the paper the normalization that for any and the marginal distributions of and conditional on follow the standard uniform distribution, even though their joint distribution allows for any kind of dependency between those two variables. As a consequence of such normalization, represents the propensity score and is equal to , while is equal to .
Moreover, I assume that:
Assumption 1
The instrument is independent of all latent variables given the covariates , i.e., Z\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\left(U,V,Y_{0}^{*},Y_{1}^{*}\right)\left|X\right..
Assumption 2
The distribution of given is nondegenerate.
Assumption 3
The first and second population moments of the potential outcomes of interest are finite, i.e., and for any .
Assumption 4
Both treatment groups exist for any value of X, i.e., .
Assumption 5
The covariates are invariant to counterfactual manipulations, i.e., , where and are the counterfactual values of that would be observed when the person is, respectively, not treated or treated.
Assumption 6
The potential outcomes and have the same support, i.e., , where is the support of and is the support of .
Assumption 7
Define and . I assume that and are known, and that
, and is an interval, or 2. 2.
, and is an interval, or 3. 3.
, and
- (a)
* is an interval or* 2. (b)
* and .*
Assumption 7 is fairly general. Case 1 covers continuous random variables whose support is convex and bounded below (e.g., wages), while Case 3.a covers continuous variables with bounded convex support (e.g., test scores). Case 3.b encompasses not only binary variables, but also any discrete variable whose support is finite (e.g., years of education). It also includes mixed random variables whose support is not an interval but achieves its maximum and minimum. Case 2 is included for theoretical complementness. Furthermore, Proposition 13 shows that assumption 7 is partially necessary to the existence of bounds for the of interest in the sense that, if and , then it is impossible to bound the marginal treatment effect on the outcome of interest within the always-observed subpopulation without any extra assumptions.
Assumption 8
Treatment has a positive effect on the sample selection indicator for all individuals, i.e., for any .
Assumption 8 goes beyond the monotonicity condition implicitly imposed by equation (2) by assuming that the direction of the effect of treatment on the sample selection indicator is known and positive, i.e., for any . In this sense, it is a standard assumption in the literature.777Lee (2009) and Chen & Flores (2015) write it in an equivalent way as , while Manski (1997) and Manski & Pepper (2000) call it the “monotone treatment response” assumption. Most importantly, it is also a testable assumption using the tools developed by Machado et al. (2018), because, under monotone sample selection (equation (2)), identification of the sign of the ATE on the selection indicator provides a test for Assumption 8. However, Assumption 8 is slightly stronger than what is usually imposed in the literature, because it additionally imposes and for any . While the first inequality implies that there is a subpopulation who is always observed, allowing me to properly define my target parameter (the marginal treatment effect on the outcome of interest within the always-observed population, ), the second inequality implies that there is a subpopulation who is observed only when treated, making the problem theoretically interesting by eliminating trivial cases of point identification of the as discussed in Proposition 10. Finally, I emphasize that all my results can be stated and derived with some straightforward changes if I impose for any instead of Assumption 8, as is done in Appendix C. I also discuss, in Appendix D, an agnostic approach to monotonicity in the sample selection problem (equation (2)) and show, in Appendix E, that bounds derived with non-monotone sample selection are uninformative (i.e., equal to ) under mild regularity conditions.
In my empirical application, Assumption 8 imposes that the JCTP has a positive effect on employment for all individuals, which is plausible given the objectives and services provided by this training program. As discussed by Chen & Flores (2015), the two potential threats against it — the “lock-in” effect (van Ours (2004)) and an increase in the reservation wage of treated individuals — are likely to become less relevant in the long run, justifying my focus on the hourly wage after 208 weeks from randomization. Most importantly, this assumption is formally tested by the method developed by Machado et al. (2018) and I reject, at the 1%-significance level, the null hypothesis that Assumption 8 is invalid within the Non-Hispanic group.
Finally, in partial identification contexts, extra assumptions may have a lot of identification power. In the specific case of identifying treatment effects with sample selection, it is common to use mean or stochastic dominance assumptions to tighten the bounds for the parameter of interest (Imai (2008), Blanco et al. (2013a), Huber & Mellace (2015) and Huber et al. (2017)) and justify them based on the intuitive argument that some population sub-groups have more favorable underlying characteristics than others. In particular, I discuss the identifying power of the following mean dominance assumption888In appendix F, I derive bounds for the MTE of interest when the above inequality holds in the other direction.:
Assumption 9
The potential outcome when treated within the always-observed subpopulation is greater than or equal to the same parameter within the observed-only-when-treated subpopulation:
[TABLE]
for any and .
Unfortunately, this assumption is empirically untestable, implying that its use must be justified for each application based on qualitative or theoretical arguments. In particular, in my empirical application, Assumption 9 imposes that the marginal treatment response function of wages when treated for the always-employed population is greater than the same object for the employed-only-when-treated population. Intuitively, this assumption imposes that the group with better potential employment outcomes also has, on average, better potential wages, i.e., there is positive selection into employment.
Bounds for the on the outcome of interest
The target parameter, the MTE on the outcome of interest for the subpopulation who is always observed (), is given by
[TABLE]
for any and any , and is a natural parameter of interest. In labor market applications where sample selection is due to observing wages only when agents are employed, it is the effect on wages for the subpopulation who is always employed. In medical applications where sample selection is due to the death of a patient, it is the effect on health quality for the subpopulation who survives regardless of treatment status. In the education literature where sample selection is due to students quitting school, it is the effect on test scores for the subpopulation who do not drop out of school regardless of treatment status. In all those cases, the target parameter captures the intensive margin of the treatment effect.999If the researcher is interested in the extensive margin of the treatment effect, captured by the MTE on the observable outcome () and by the MTE on the selection indicator (), he or she can apply the identification strategies described by Heckman et al. (2006), Brinch et al. (2017) and Mogstad et al. (2018).
Other possibly interesting parameters are the MTE on the outcome of interest within the subpopulation who is never observed (, ), the MTR function under no treatment for the outcome of interest within the subpopulation who is observed only when treated (, ) and the MTR function under treatment for the outcome of interest within the subpopulation who is observed only when treated (, ). While the last parameter can be partially identified (Appendix B), the first two parameters are impossible to point identify or bound in an informative way because the outcome of interest ( or ) is never observed for the conditioning subpopulations.101010Zhang et al. (2008) discuss this identification issue in a deeper way. Moreover, in some applications (e.g., analyzing the impact of a medical treatment on a health quality measure where selection is given by whether the patient is alive), the potential outcome is not even properly defined when for . As a consequence, it is not possible to point identify or bound in an informative way the Marginal Treatment Effect for the entire population (, ) either. Note also that the subpopulation who is observed only when not treated ( and ) does not exist by Assumption 8. Furthermore, observe that the conditioning subpopulations in all the above-mentioned parameters are determined by post-treatment outcomes and, as a consequence, are connected to the statistical literature known as principal stratification (Frangakis & Rubin (2002)).
I now focus on the target parameter given by equation (3). While Subsection 3.1 derives bounds for the of interest (equation (3)) using only a monotonicity assumption (Assumptions 1-8), Subsection 3.2 tightens those bounds by additionally imposing the Mean Dominance Assumption 9. Finally, Subsection 3.3 discusses the empirical relevance of such bounds.
Partial Identification with only a Monotonicity Assumption
Here, my goal is to derive bounds for under Assumptions 1-8. Note that the second right-hand term in equation (3) can be written as111111Appendix A.1 contains a proof of this claim.
[TABLE]
where I define and as the MTR functions associated with the counterfactual variables and , respectively. In this section, I assume that all terms in the right-hand side of equation (4) are point identified, postponing the discussion about their identification to Sections 4 and 5.
The first right-hand term in equation (3) can be written as121212Appendix A.2 contains a proof of this claim.
[TABLE]
where is the MTR function associated with the counterfactual variable , is the MTE on the observable outcome for the subpopulation who is observed only when treated, is the MTE on the selection indicator, and is the MTR function associated with the counterfactual variable . In this section, I also assume that and are point identified, postponing the discussion about their identification to Sections 4 and 5.
Although point identification of is not possible due to the term in equation (5), I can find identifiable bounds for it.131313Appendix A.3 contains a proof of this proposition.
Proposition 10
Suppose that , , and are point identified.
Under Assumptions 1-6, 7.1 and 8, must satisfy
[TABLE]
Under Assumptions 1-6, 7.2 and 8, must satisfy
[TABLE]
Under Assumptions 1-6, 7.3 (sub-case (a) or (b)) and 8, must satisfy
[TABLE]
Note that, even when the support is bounded in only one direction (Assumptions 7.1 and 7.2), it is possible to derive lower and upper bounds for .
At this point, it is worth understanding the determinants of the width of those bounds. First, if there is no sample selection problem at all (, i.e., the always-observed group is the entire population), then , , implying point identification in equation (5). Second, if there is no problem of differential sample selection with respect to treatment status (, i.e., the observed-only-when-treated subpopulation has zero mass), then , once more implying point identification in equation (5). Both cases are theoretically uninteresting and ruled out by Assumption 8.
Finally, combining equations (3) and (4) and Proposition 10, I can partially identify the target parameter :
Corollary 11
Suppose that , , and are point identified.
Under Assumptions 1-6, 7.1 and 8, the bounds for are given by
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.2 and 8, the bounds for are given by
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.3 (sub-case (a) or (b)) and 8, the bounds for are given by
[TABLE]
and
[TABLE]
Furthermore, I can show that141414The definition of pointwise sharpness used here and in the rest of the paper follows the definition of sharpness given by Canay & Shaikh (2017, Remark 2.1.). Moreover, note that, if the functions , , and are point identified only in a subset of the unit interval, then pointwise sharpness holds only in that subset.:
Theorem 12
Suppose that the functions , , and are point identified at every pair . Under Assumptions 1-6, 7 (sub-cases 1, 2, 3(a) or 3(b)) and 8, the bounds and , given by Corollary 11, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Proof. Here, I provide a sketch of the proof of Theorem 12. Appendix A.4 contains its detailed version. I define candidate random variables through their joint cumulative distribution function and then check that equations (15), (16) and (17) are satisfied. Intuitively, I define this joint probability function to be equal to at every point, but the point . By doing so, I ensure that the equation (17) holds because is associated to a mass zero set. I, then, define the function at to ensure that equations (15) and (16) hold.
Intuitively, Theorem 12 says that, for any , it is possible to create candidate random variables that generate the candidate marginal treatment effect (equation (15)), satisfy the bounded support condition — a restriction imposed by my model (Assumption 7) and summarized in equation (16) — and generate the same distribution of the observable variables — a restriction imposed by the data and summarized in equation (17). In other words, the data and the model in Section 2 do not generate enough restrictions to refute that the true target parameter is equal to the candidate target parameter .
Moreover, the bounded support condition (Assumption 7) is partially necessary to the existence of bounds for the target parameter . When the support is unbounded in both directions (i.e., and ), then it is impossible to derive bounds for the target parameter without any extra assumption. Proposition 13 formalizes this last statement.151515Appendix A.5 contains the proof of this proposition, whose intuition is similar to the one provided for Theorem 12.
Proposition 13
Suppose that the functions , , and are point identified at every pair . Impose Assumptions 1-6 and 8. If , then, for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
In other words, when the support of the potential outcome is the entire real line, the data and the model in Section 2 do not generate enough restrictions to refute that the true target parameter is equal to an arbitrarily large effect in magnitude. This impossibility result is interesting in light of the previous literature about partial identification of treatment effects with sample selection. In the case of the (Lee (2009)) and the (Chen & Flores (2015)), it is possible to construct informative bounds even when the support of the potential outcome is the entire real line. However, when focusing on a specific point of the function, it is impossible to construct informative bounds when due to the local nature of the target parameter.
There is one remark about the results I just derived. Theorem 12 and Proposition 13 do not impose any smoothness condition on the joint distribution of . In particular, the conditional cumulative distribution functions , and are allowed to be discontinuous functions of U at the point . Appendix G states and proves a sharpness result similar to Theorem 12 and an impossibility result similar to Proposition 13 when , and must be continuous functions of U.
Partial Identification with an Extra Mean Dominance Assumption
Here, I use the Mean Dominance Assumption 9 to tighten the bounds for the target parameter (equation (3)) given by Corollary 11. Note that Assumption 9 implies that by equations (A.4) and (A.5). As a consequence, by following the same steps of the proof of corollary 11, I can derive:
Corollary 14
Fix and arbitrarily. Suppose that the , , and are point identified.
Under Assumptions 1-6, 7.1, 8 and 9, must satisfy
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.2, 8 and 9, must satisfy
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.3 (sub-case (a) or (b)), 8 and 9, must satisfy
[TABLE]
and
[TABLE]
When and Assumptions 1-6, 8 and 9 hold, must satisfy
[TABLE]
and
[TABLE]
Notice that, under Mean Dominance Assumption 9, I can increase the lower bounds proposed in Corollary 11 under Assumption 7 and provide an informative lower bound even when the support of the outcome of interest is the entire real line, a result in stark contrast with Proposition 13.161616Appendix A.6 discusses when Corollary 14 provides bounds that are strictly tighter than the ones provided by Corollary 11. These improvements clearly show the identifying power of the Mean Dominance Assumption 9. Moreover, the phenomenon of obtaining more informative bounds by imposing extra assumptions is common in the partial identification literature, as explained by Tamer (2010) and illustrated by Kline & Tartari (2016).
As in Subsection 3.1, I assume that , , , , and are point identified, postponing the discussion about their identification to Sections 4 and 5.
Now, using the above corollary, I can combine the sharpness and the impossibility results of Subsection 3.1 in one single proposition171717Appendix A.7 contains a proof of this proposition, whose intuition is similar to the one provided for Theorem 12. The only difference is that, now, the function at must also satisfy equation (31).:
Proposition 15
Suppose that the functions , , , and are point identified at every pair . Under Assumptions 1-6, 8 and 9, the bounds and , given by Corollary 14, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Note that, in addition to all the restriction imposed by Theorem 12, the candidate random variables must also satisfy an extra model restriction (equation (31)) associated with the Mean Dominance Assumption 9. Intuitively, Proposition 15 says that the data (equation (32)) and the model (equations (30) and (31)) do not generate enough restrictions to refute that the true target parameter is equal to the candidate target parameter (equation (29)).
Empirical Relevance of bounds for the of Interest
Now, it is worth discussing the empirical relevance of partially identifying the of interest. First, bounds for the can illuminate the heterogeneity of the treatment effect, allowing the researcher to understand who would benefit and who would lose with a specific treatment. This is important because common parameters (e.g., , , , ) can be positive even when most people lose with a policy if the few winners have very large gains. Moreover, knowing, even partially, the function can be useful to optimally design policies that provides incentives to agents to take some treatment. Second, I can use the bounds to partially identify any treatment effect that is described as a weighted integral of because
[TABLE]
where is a known or identifiable weighting function. Even though such bounds may not be sharp for any specific parameter, they are a general and off-the-shelf solution to many empirical problems. As a consequence of this trade-off, I recommend the applied researcher to use a specialized tool if he or she is interested in a parameter that already has specific bounds for it (e.g., by Lee (2009) and by Chen & Flores (2015)). However, I suggest the applied researcher to easily compute a weighted integral of pointwise sharp bounds for the MTE of interest if he or she is interested in parameters without specialized bounds (e.g., ATE, ATT and ATU in the case with imperfect compliance). In other words, facing a trade-off between empirical flexibility and sharpness, the partial identification tool proposed in this paper focus on empirical flexibility while still ensuring pointwise sharpness of the bounds for the MTE of interest.
Tables 1 and 2 show some of the treatment effect parameters that can be partially identified using inequality (33). More examples are given by Heckman et al. (2006, Tables 1A and 1B) and Mogstad et al. (2018, Table 1).
Partial identification when the support of the propensity score is an interval
Here, I fix and impose that the support of the propensity score, defined by , is an interval181818 as an interval may be achieved by a continuous instrument or by the existence of independent covariates (Carneiro et al., 2011).. Then, under Assumptions 1-5, the MTR functions associated with any variable are point identified by191919Appendix A.8 contains a proof of this claim based on the Local Instrumental Variable (LIV) approach described by Heckman & Vytlacil (2005).:
[TABLE]
and
[TABLE]
for any .
Finally, the pointwise sharp bounds for are point identified by combining equations (34) and (35), the fact that , and Corollaries 11 or 14.
Partial identification when the support of the propensity score is discrete
When the support of the propensity score is not an interval, I cannot point identify , , , , and without extra assumptions, implying that I cannot identify the bounds for given by Corollaries 11 or 14. There are two solutions for this lack of identification: I can non-parametrically bound those four objects (Mogstad et al. (2018)) or I can impose flexible parametric assumptions (Brinch et al. (2017)) to point identify them. While the first approach is discussed in Subsection 5.1, the second one is detailed in Subsection 5.2.
Non-parametric outer set around the of interest
For any and , I can bound , , , , and using the machinery proposed by Mogstad et al. (2018). To do so, fix and and define the pair of functions and the set of admissible MTR functions . For example, in the case of a binary function, the admissible set would be and, in the case of the selection indicator, this set would be further restricted by Assumption 8 to
[TABLE]
Moreover, define the function as:
[TABLE]
and observe that . Furthermore, define to be a collection of known or identified measurable functions whose second moment is finite. For each IV-like specification , define also . According to Mogstad et al. (2018, Proposition 1), the function , defined as
[TABLE]
satisfies . As a result, must lie in the set of admissible functions that satisfy the restrictions imposed by the data through the IV-like specifications, where:
[TABLE]
Assuming that is convex and for every , Mogstad et al. (2018, Proposition 2) show that:
[TABLE]
Based on this result, I can also define bounds for the MTR functions as
[TABLE]
where
[TABLE]
As a consequence, I can combine Corollaries 11 and 14 and inequalities (36) and (37) to provide a non-parametrically identified outer set around , that contains the true target parameter by construction. However, the cost of non-parametric partial identification of , , , , and is losing the pointwise sharpness of the bounds around the target parameter .
Parametric identification of the bounds
The fully non-parametric approach explained in Subsection 5.1 may provide an uninformative outer set (e.g., equal to or when the support of the potential outcome is bounded). In such cases, parametric assumptions on the marginal treatment response functions may buy a lot of identifying power. Although restrictive in principle, parametric assumptions may be flexible enough to provide credible bounds for , as illustrated by Brinch et al. (2017).
I fix and assume that the support of the propensity score is discrete and given by for some . I could directly apply the identification strategy proposed by Brinch et al. (2017) by assuming that the MTR functions associated with and are polynomial functions of . However, this assumption is problematic for binary variables, such as the selection indicator . For this reason, I make a small modification to the procedure created by Brinch et al. (2017): for and , the MTR function is given by
[TABLE]
for any , where is a set of feasible parameters, is the number of parameters for each treatment group , is a vector of pseudo-true unknown parameters, and is a known function. For instance, in the case of a binary variable, a reasonable choice of is the Bernstein Polynomial with feasible set . In the case of the selection indicator, the feasible set would be further restricted by Assumption 8 to . I stress that the only difference between the Bernstein polynomial model and the simple polynomial model proposed by Brinch et al. (2017) is that it is easier to impose feasibility restrictions on the former model.
Back to the parametric model given by equation (38), I define the parameters as pseudo-true parameters in the sense that the parametric model in equation (38) is an approximation to the true data generating process via the moments for any and . Formally, I define
[TABLE]
Note that, to estimate parameters , I can simply use the sample analogue of equation (39), i.e., I only have to estimate a constrained OLS regression whose restrictions are given by the set . If the model restrictions imposed through the set of feasible parameters are valid and , then my parametric model collapses to the model proposed by Brinch et al. (2017) and I find that202020Appendix A.9 contains a proof of this claim., for any ,
[TABLE]
I can then combine Corollaries 11 and 14 and equations (38) and (39) to bound .
Empirical Application: Job Corps Training Program
I focus on analyzing the Marginal Treatment Effect of the Job Corps Training Program (JCTP) on wages for the always-employed subpopulation (). This program provides free education and vocational training to individuals who are legal residents of the U.S., are between the ages of 16 and 24 and come from a low-income household (Schochet et al. (2001) and Lee (2009)). Besides receiving education and vocational training, the trainees reside in the Job Corps center, that offers meals and a small cash allowance.
In the mid 1990’s, the U.S. Department of Labor hired Mathematica Policy Resarch, Inc., to evaluate the JCTP through a randomized experiment. According to Chen & Flores (2015), eligible people who applied to JCTP for the first time between November 1994 and December 1995 (80,833 applicants) were randomly assigned into a treatment group and a control group. People in the control group (5,977) were embargoed from the program for 3 years, while those in the treatment group (74,856) were allowed to enroll in JC. However, in this randomized control trial, there was non-compliance (selection into treatment) because some individuals in the treated group decided not to participate in the program and some individuals in the control group were able to attend the JCTP even though they were officially embargoed.
To evaluate the JCTP, I start by describing the dataset, providing summary statistics and, most importantly, formally testing the assumptions that the potential treatment status is monotone on the instrument (equation (1)) and that the potential employment (sample selection status) is positively monotone on the treatment (Assumption 8) using the test elaborated by Machado et al. (2018). I then estimate and discuss the marginal treatment responses and effects on employment and labor earnings using the parametric tool developed by Brinch et al. (2017). Finally, I estimate and discuss the bounds for the on wages without and with the mean dominance assumption (Assumption 9), given, respectively, by Corollaries 11 and 14.
Descriptive Statistics and the Monotonicity Assumptions
The publicly available National Job Corps Study (NJCS) sample contains 15,386 individuals — all 5,977 control group individuals and 9,409 randomly selected treatment group individuals. All of them were interviewed at random assignment and at 12, 30 and 48 months after random assignment. Following Lee (2009), I only keep individuals with non-missing values for weekly earnings and weekly hours worked for every week after randomization (9,145). Following Chen & Flores (2015), my instrument () is random treatment assignment and my treatment dummy () is an indicator variable that is equal to one if the individual was ever enrolled in the JCTP during the 208 weeks after random assignment. Since this variable has 51 missing values, the final sample size is 9,094 observations.
The dataset contains information about demographic covariates (sex, age, race, marriage, number of children, years of schooling, criminal behavior, personal income) and pre- and post-treatment labor market outcomes (employment and earnings). Following Chen & Flores (2015), hourly wages at week 208 are created by dividing weekly earnings by weekly hours worked at that week, implying that a missing wage is equivalent to zero weekly hours worked. I consider the person to be unemployed () when the wage is missing and to be employed () when the wage is non-missing. Differently from Lee (2009) and Chen & Flores (2015), who use log hourly wages as their main outcome variable, my outcome of interest () is the level of the hourly wage because Assumption 7.1 requires that the support has a finite lower bound. As a consequence, the observable outcome is defined as hourly labor earnings. Finally, I use the NJCS design weights in my empirical analysis because some subpopulations were randomized with different, but known, probabilities (Schochet et al. (2001)).
Considering the results found by Flores-Lagunes et al. (2010), who focus on explaining the negative but insignificant effects on employment and labor earnings for the Hispanic subpopulation, I separately analyze two subsamples from the NJCS sample: the Non-Hispanics subsample and the Hispanics subsample. Table 3 shows descriptive statistics for both subsamples. Note that, as expected, the pre-treatment covariates are, on average, very similar between the groups defined by the random treatment assignment. Consequently, both subsamples maintain the balance of baseline variables. However, when comparing Non-Hispanics and Hispanics, I find numerically small differences with respect to the variables female, never married, has children, ever arrested, has a job at baseline, and had a job.
Table 4 shows preliminary effects within the Non-Hispanic and the Hispanic subsamples. The first row shows that a large number of individuals did not comply to their treatment assignment. As is expected for any voluntary treatment, a large share of individuals (around 30% for both subsamples) decided not to take the treatment even though they were assigned to the treatment group. There are also some individuals (5% among Non-Hispanics and 3% among Hispanics) who attended the JCTP even though they were embargoed. Moreover, the instrument (treatment assignment) is clearly strong for both subsamples, suggesting that Assumption 2 is plausible in this context. When analyzing the treatment effects and similarly to the previous literature (e.g., Schochet et al. (2008), Flores-Lagunes et al. (2010) and Chen & Flores (2015)), we find that the JCTP has a positive and significant effect on Non-Hispanics and a negative but insignificant effect on Hispanics.
This last result, particularly with respect to the employment status, is important for my analysis. Similarly to Lee (2009) and Chen & Flores (2015), I assume that the effect of the treatment on employment (i.e., sample selection) is monotone and positive. However, a negative effect of JCTP on employment is evidence against this assumption as discussed by Flores-Lagunes et al. (2010) and Chen & Flores (2015). For this reason, I formally test Assumption 8. To do so, I implement the procedure developed by Machado et al. (2018), that simultaneously tests instrument exogeneity (Assumption 1), monotonicity of treatment take-up on treatment assignment (equation (1)) and monotonicity of employment on the treatment (equation (2)). Their procedure also uses this last test as a gate-keeper to test that the effect of the treatment on employment is positive (Assumption (8)).
In a more detailed way, the test proposed by Machado et al. (2018) has three steps. In the first step, the null hypothesis is that the instrument is not exogenous, or treatment take-up is not monotone on treatment assignment, or employment is not monotone on treatment take-up. As a consequence, the alternative hypothesis is that Assumption 1 and equations (1) and (2) hold. In the second step, that is implemented only if the first step rejects its null hypothesis, the second null hypothesis is that the effect of the treatment on employment is non-positive. Consequently, its alternative hypothesis is that Assumptions 1 and 8 and equations (1) and (2) hold. Finally, in the third step, that is implemented only if the second step does not reject its null hypothesis, the third null hypothesis is that the effect of the treatment on employment is non-negative. Consequently, its alternative hypothesis is that, while Assumption 1 and equations (1) and (2) are valid, Assumption 8 holds in the opposite direction (see Assumption C.1).
Table 5 shows the results of the test described above. Within the Non-Hispanics subsample, steps 1 and 2 reject their null hypotheses at the 1%-significance level, implying that Assumptions 1 and 8 and equations (1) and (2) are plausible given the data. Consequently, it is reasonable to use Corollary 11 to bound the of the JCTP on wages within the Non-Hispanics subsample. For the Hispanics subsample, step 1 rejects its null hypothesis at the 1%-significance level, while neither step 2 nor step 3 reject their null hypotheses at the 10%-significance level. As a consequence, Assumption 1 and equations (1) and (2) are plausible given the data, but it seems that there is no effect of the treatment on employment, i.e., for all individuals. With no differential sample selection for the Hispanic population, point identification of the MTE of interest is trivial as discussed immediately after Proposition 10. For this reason, I focus my empirical analysis on the Non-Hispanic subsample.
MTR and MTE on Employment and Labor Earnings: Non-Hispanics subpopulation
As a preliminary step to estimate the bounds for the of the JCTP on hourly wages within the Non-Hispanic subsample, I need to estimate the MTR functions on employment and hourly labor earnings, i.e., I need to estimate the functions , , , and . To do so, I use the procedure described in Subsection 5.2, that adapts the method developed by Brinch et al. (2017) to a constrained framework. Specifically, I model the MTR functions of and using Bernstein polynomials with four parameters, i.e., for any and with feasible sets and . To estimate . I run the following constrained OLS model:212121Appendix A.10 connects the OLS model (42) to the minimization problem (39) when the instrument is binary and there are no covariates. It also provides the explicit formula for the bounds in Corollaries 11 and 14 using the parametric model described in Subsection 5.2. Appendix H implements a Monte Carlo Simulation that analyzes the coverage rate of confidence intervals around the MTE bounds that are based on the OLS model (42).
[TABLE]
where is the error term, , , , and the constraints on are given by .
Tabel 6 reports the point-estimates and 90%-confidence intervals of the parametric models for the MTR functions on employment and hourly labor earnings. Note that the feasibility constraint is binding even though Assumption 8 is plausible according to the test proposed by Machado et al. (2018). Moreover, for the upper bound of the 90%-confidence interval, the feasibility constraint is also binding.
It is easier to understand and interpret those estimates using Figure 1. The solid lines are the point-estimates of the MTR and MTE functions based on the parameters reported in Table 6. The dotted lines are pointwise 90%-confidence intervals around the estimated functions based on 5,000 bootstrap repetitions. Blue colored lines are associated with treated potential outcomes, while red colored lines are associated with untreated outcomes. In Subfigure 1a, I find that, although the employment probability for the agents who are most likely to attend the JCTP is similar between treated and untreated individuals, the employment probability for the agents who are less likely to attend the JCTP is much higher for treated individuals than for untreated ones. As a consequence, the MTE on employment within the Non-Hispanic subsample (Subfigure 1b) is increasing in the latent heterogeneity. Similarly, in Subfigure 1c, I find that, although expected hourly labor earnings for the agents who are most likely to attend the JCTP is similar between treated and untreated individuals, expected hourly labor earnings for the agents who are less likely to attend the JCTP is much higher for treated individuals than for untreated ones. As a consequence, the MTE on hourly labor earnings within the Non-Hispanic subsample (Subfigure 1d) is increasing in the latent heterogeneity. I highlight that the shape of my estimated MTE functions are in line with the results by Chen et al. (2017), whose estimated upper bounds also suggest that the ATE on those variables is greater than the ATT.
Bounds for the on Wages: Non-Hispanic subpopulation
To partially identify the of the JCTP on wages within the Non-Hispanic subsample, I can combine the functions estimated in Subsection 6.2 with Corollaries 11 and 14. While the first corollary imposes only assumptions that are valid by the experimental design (Assumption 1), technical (Assumptions 3-7) or testable (Assumptions 2 and 8, and equation 1), Corollary 14 additionally uses the Mean Dominance Assumption 9. This last assumption imposes that the marginal treatment response function of wages when treated for the always-employed population is greater than the same object for the employed-only-when-treated population, implying a positive correlation between potential employment and potential wages, which is supported by standard models of labor supply.222222Chen & Flores (2015) discuss the connection between the Mean Dominance Assumption 9 and the Labor Economics literature in a deeper way..
Another issue when estimating bounds for a parameter of interest is that there are two ways to construct confidence intervals. The conservative method finds the -confidence intervals around the upper and lower bounds and then uses their upper most and lower most bounds to construct a confidence interval that contains the identified region with probability . Since the parameter of interest has to be inside the identified region, this confidence interval contain the parameter of interest with probability at least . An alternative method is proposed by Imbens & Manski (2004), who directly construct a -confidence interval that contains the parameter of interest. Since they take into account that the parameter of interest has to be inside the identified region by construction, their confidence interval is tighter than the conservative method.
Figure 2 shows the parametric bounds of the on wages using Corollary 11 (Subfigure 2a) and using Corollary 14 (Subfigure 2b). The solid lines are the point-estimates of the parametric bounds of the MTE on wages, while the dotted lines are pointwise conservative 90%-confidence intervals around the identified region based on 5,000 bootstrap repetitions and the dashed lines are pointwise 90%-confidence intervals of the parameter of interest (Imbens & Manski (2004)) based also on 5,000 bootstraps repetitions.
As a way to understand the magnitude of the effects, I compare the estimated bounds against the average observed hourly wage of the Non-Hispanics assigned to the control group, 7.72. Note that the lower bounds that do not use the mean dominance assumption (Subfigure [2a](#S6.F2.sf1)) are implausibly negative. Even for the agents who are the most likely to attend the JCTP, the lower bound of the MTE^{OO}6.51) imply that the JCTP would drive their hourly wages almost to zero. This implausibly negative lower bound is based on the worst-case scenario that unrealistically imposes that the treated potential wage for the always-employed subpopulation is equal to zero.
By imposing the Mean Dominance Assumption 9, I rule out this extreme case by assuming that there is positive selection into employment. As a consequence, I can increase the lower bound from equation (9) to equation (21), narrowing the bounds of the on wages (Subfigure 2b). Under this extra assumption, the on wages is significant at the 10%-confidence level for latent heterogeneity values between 0.34 and 0.68 when I use the conservative confidence interval and between 0.35 and 0.73 when I use the confidence interval based on Imbens & Manski (2004). Most interestingly, the point-estimate of the lower bound of the on wages is decreasing in the likelihood of attending the JCTP.
To better understand the magnitude of those effects and compare my results with the previous literature, I summarize the bounds for the function using four key parameters — , , and — that are described in Tables 1 and 2 as integrals of the function. Table 7 reports those bounds in brackets, the 90%-conservative confidence intervals of the identified region in parenthesis and the 90%-confidence intervals of the parameter of interest (Imbens & Manski (2004)) in braces. As expected, the bounds without the mean dominance assumption are wide and uninformative, while, when imposing Assumption 9, all parameters but the are significant at 10% according to both types of confidence intervals.
I stress that my estimates represent an effect between 7.51% and 24.74% of the average observed hourly wage of the Non-Hispanics assigned to the control group, which are comparable to the bounds of the parameter derived by Chen & Flores (2015) — approximately between 7.7% and 17.5% under a similar set of assumptions. The finding that their bounds are tighter than mine for the is not surprising because their method leverages all the available information to specifically identify the while my tool bounds the function and then flexibly bounds the other treatment effects for the always-employed population.
As a consequence of this flexibility, I can partially identify other treatment effects that may be policy-relevant. For example, the is bounded between 7.90% and 29.53% of the average observed hourly wage of the Non-Hispanics assigned to the control group. Most interestingly, the and the are, respectively, bounded between 4.27% and 12.82%, and 9.20% and 38.86%, suggesting that the agents who do not attend the JCTP might be the ones who would benefit the most from it. This result is even stronger when we analyze the confidence intervals around the and the : while the first treatment effect is not significantly different from zero, the second parameter is significantly different from zero. To conclude, I highlight that, even though the upper bound of the treatment effects on wages may be unrealistically large, the magnitude of the lower bounds are similar to the results found by Chen et al. (2017) and are reasonable when compared to ITT effects of 16.70% on earnings per week and of 9.87% on hours per week that are shown in Table 4.
Conclusion
My main theoretical contribution provides pointwise sharp bounds for the MTE of interest within the always-observed subpopulation by imposing a monotonicity assumption that the treatment has a positive impact on sample selection for every agent. Those bounds are tightened by imposing an extra mean dominance assumption that the potential outcome when treated within the always-observed subpopulation is greater than or equal to the same parameter within the observed-only-when-treated subpopulation. Both bounds can be estimated using the LIV approach if the instrument is continuous, using a non-parametric outer set based on the method developed by Mogstad et al. (2018), or using a parametric model based on the strategy proposed by Brinch et al. (2017). Such bounds are useful to analyze many empirical problems that include endogenous self-selection into treatment and sample selection.
My main empirical findings suggest that the marginal treatment effect of the Job Corps Training Program (JCTP) on employment, hourly labor earnings and hourly wages increases with the latent heterogeneity variable within the Non-Hispanic group. More specifically, while MTEs for the agents who are the most likely to attend the JCTP are very small, the MTEs for the agents who are the least likely to attend the JCTP are considerably large. Economically, this result implies that the agents who are more likely to benefit from the JCTP are not attending it due to some unobserved constraint. A similar result is found by Chen et al. (2017), whose empirical evidence suggests that the effects of the JCTP on employment and labor earnings for never-takers are significantly positive. They argue that those agents are not enrolling at the JCTP due to family constraints (lack of childcare services), incomplete information on JCTP’s benefits, overconfidence or personal preferences for non-enrollment. A more complete analysis of why agents who would benefit from attending the JCTP are not doing so is beyond the scope of this paper, but is an important question for future research because it may help policy makers to better target the JCTP to the population who would benefit the most from this program.
Appendix A Proofs of the main results
Proof of Equation (4)
Note that
[TABLE]
Proof of Equation (5)
First, observe that
[TABLE]
Note also that:
[TABLE]
implying equation (5) after some rearrangement.
Proof of Proposition 10
Note that
[TABLE]
by the definition of and . Observe also that
[TABLE]
by equation (A.4) and the definition of and , implying, by equation (5), that
[TABLE]
under assumption 7.1,
[TABLE]
under assumption 7.2, and
[TABLE]
under Assumption 7.3 (sub-case (a) or (b)). Combining equations (A.6)-(A.9), it is easy to show that Proposition 10 holds.
Proof of Theorem 12
First, I prove Theorem 12 under Assumption 7.3 (sub-cases (a) and (b)). At the end of this section, I prove Theorem 12 under assumptions 7.1 and 7.2.
A.4.1 Proof under Assumption 7.3 (sub-cases (a) and (b))
Fix , and arbitrarily. For brevity, define and .
Note that
[TABLE]
and that
[TABLE]
The strategy of this proof consists of defining candidate random variables through their joint cumulative distribution function and then checking that equations (15), (16) and (17) are satisfied. I fix and define in twelve steps:
- Step 1.
For , . 2. Step 2.
From now on, consider . Since
[TABLE]
it suffices to define . Moreover, I impose
[TABLE]
by writing
[TABLE]
implying that it is sufficient to define . 3. Step 3.
For , I define . 4. Step 4.
From now on, consider . Since
[TABLE]
it suffices to define and . 5. Step 5.
I define . 6. Step 6.
For any , I define . 7. Step 7.
For any , I define . 8. Step 8.
From now on, consider . Since
[TABLE]
it is sufficient to define and . 9. Step 9.
I define
[TABLE] 10. Step 10.
I write , implying that I can separately define and . 11. Step 11.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because . 12. Step 12.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because of equations (A.10) and (A.11).
Having defined the joint cumulative distribution function , note that equations (A.10) and (A.11), and steps 7-12 ensure that equation (16) holds.
Now, I show, in three steps, that equation (15) holds.
- Step 13.
Observe that
[TABLE] 2. Step 14.
Similarly to the last step, notice that
[TABLE] 3. Step 15.
Note that
[TABLE]
ensuring that equation (15) holds.
Finally, I show, in two steps, that equation (17) holds.
- Step 16.
Fix arbitrarily and observe that equation (17) can be simplified to:
[TABLE] 2. Step 17.
Notice that
[TABLE]
implying equation (17) according to equation (A.14).
I can then conclude that Theorem 12 is true.
As a remark, the above constructive proof defines random variables that matches other important moments of the true data generating process besides the ones imposed by Theorem 12.
- Remark 1.
Note that
[TABLE]
and, similarly, that
[TABLE] 2. Remark 2.
Analogously to equation (A.12), I find that
[TABLE] 3. Remark 3.
Combining equations (A.5), (A.12) and (A.15)-(A.17), I have that
[TABLE] 4. Remark 4.
Similarly to step 17, I can show that implying that and for any .
A.4.2 Proof under Assumptions 7.1 and 7.2
I, now, prove Theorem 12 under Assumptions 7.1 and 7.2. In particular, I focus on the case and (Assumption 7.1) because it is more common in empirical applications. The case and (Assumption 7.2) is symmetric.
The proof under Assumption 7.1 is equal to the proof under Assumption 7.3(a). The only difference is that
[TABLE]
and that
[TABLE]
Proof of Proposition 13
This proof is essentially the same proof of Theorem 12 under Assumption 7.3.(a) (appendix A.4.1). Fix , and arbitrarily. For brevity, define and . Note that and .
I define the random variables using the joint cumulative distribution function described by steps 1-12 in Appendix A.4.1 for the case of convex support . Note that equation (19) is trivially true when . Moreover, equations (18) and (20) are valid by the argument described in steps 13-17 in Appendix A.4.1.
I can then conclude that Proposition 13 is true.
Comparing Corollaries 11 and 14
In order to compare Corollaries 11 and 14, I first prove that the second corollary provides lower bounds that are weakly larger than the lower bounds provided by the first corollary.
Fix and arbitrarily and note that
[TABLE]
implying that . Consequently, observe that
[TABLE]
The argument above shows that Corollary 14 provides bounds that are weakly tighter than the ones provided by Corollary 11. They will be strictly tighter if . Moreover, the improvement generated by the Mean Dominance Assumption 9 is proportional to and because
[TABLE]
Proof of Proposition 15
This proof is essentially the same proof of Theorem 12 and Proposition 13 (Appendices A.4 and A.5). Fix , and arbitrarily. For brevity, define and . The only difference from the previous proofs is that, now,
[TABLE]
and that
[TABLE]
implying that the model restriction (31) holds.
Proof of Equations (34) and (35)
I first prove that equation (34) holds. For any , observe that
[TABLE]
implying that
[TABLE]
Rearranging the last expression, I can derive equation (34):
[TABLE]
Equation (35) is derived in an analogous way using and its derivative with respect to the propensity score.
Proof of Equations (40) and (41)
We first prove that equation (40) holds. For any , observe that
[TABLE]
Equation (41) is derived in an analogous way using .
Parametric Bounds for the
A.10.1 Connecting OLS Model (42) to the Minimization Problem (39)
Note that, for any ,
[TABLE]
where and , and
[TABLE]
where and .
When I combine equations (39), (A.21) and (A.22), I find the OLS model given by equation (42). Moreover, by solving the linear system given by , , and , I find that , , , .
A.10.2 Explicit Formulas for the Bounds in Corollaries 11 and 14
When the marginal treatment response functions are given by the parametric model described in Subsection 6.2 and the outcome of interested is bounded below by zero (e.g., hourly wages), Corollary 11 implies that, for any and ,
[TABLE]
and
[TABLE]
In the same context, Corollary 14 implies that
[TABLE]
and
[TABLE]
Appendix B Bounds for the MTR within the Observed-only-when-treated subpopulation
Here, I use the same notation of Section 3 and I am interested in the following target parameter: , which is equal to according to equation (A.4). Following the same steps of the proof of Proposition 10, I can show that:
Corollary B.1
Suppose that the , , and are point identified.
Under assumptions 1-6, 7.1 and 8, the bounds for are given by
[TABLE]
Under assumptions 1-6, 7.2 and 8, the bounds for are given by
[TABLE]
Under assumptions 1-6, 7.3 (sub-case (a) or (b)) and 8, the bounds for are given by
[TABLE]
Following the same proof of Theorem 12 (see Remark 2 at the end of Appendix A.4.1), I can also show that:
Proposition B.2
Suppose that the functions , , and are point identified at every pair . Under assumptions 1-6, 7 (sub-cases 1, 2, 3(a) or 3(b)) and 8, the bounds and , given by Proposition B.1, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Finally, following the same proof of Proposition 13, I can also show that:
Proposition B.3
Suppose that the functions , , and are point identified at every pair . Impose assumptions 1-6 and 8. If , then, for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Appendix C Negative Treatment Effect on the Selection Indicator
Even when sample selection is monotone (equation (2)), Assumption 8 may be invalid in some empirical applications. In particular, it might be the case that the following assumption holds:
Assumption C.1
Treatment has a negative effect on the sample selection indicator for all individuals, i.e., for any .
I stress that this assumption is testable according to Machado et al. (2018).
With straightforward modifications to the proofs of Corollary 11, Theorem 12 and Proposition 13 (see the proofs of Propositions D.3 and D.4), I can show that the target parameter in Section 3 can be bounded, that its bounds are sharp and that it is impossible to derive bounds for the target parameter with only assumptions 1-6 and C.1. First, I state a result that is analogous to Corollary 11.
Corollary C.2
Fix and arbitrarily. Suppose that the , , and are point identified.
Under Assumptions 1-6, 7.1 and C.1, the bounds for are given by
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.2 and C.1, the bounds for are given by
[TABLE]
and
[TABLE]
Under Assumptions 1-6, 7.3 (sub-case (a) or (b)) and C.1, the bounds for are given by
[TABLE]
and
[TABLE]
Second, I state a result that is analogous to Theorem 12.
Proposition C.3
Suppose that the functions , , and are point identified at every pair . Under Assumptions 1-6, 7 (sub-cases 1, 2, 3(a) or 3(b)) and C.1, the bounds and , given by Proposition C.2, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Finally, I state a result that is analogous to Proposition 13.
Proposition C.4
Suppose that the functions , , and are point identified at every pair . Impose Assumptions 1-6 and C.1. If , then, for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Appendix D Monotone Sample Selection
Depending on the results of the test proposed by Machado et al. (2018), a researcher may want to be agnostic about the direction of the monotone selection problem and impose only equation (2), while ruling out uninteresting cases. In this situation, it is reasonable to assume:
Assumption D.1
Treatment has a monotone effect on the sample selection indicator for all individuals, i.e., either (i) for any or (ii) for any .
Note that Assumption D.1 only strengthens equation (2) by ruling out the theoretically uninteresting cases mentioned after Assumption (8).
By combining Corollaries 11 and C.2, I find that:
Corollary D.2
Fix and arbitrarily. Suppose that the , , and are point identified. Under Assumptions 1-6, 7 and D.1, the bounds for are given by
[TABLE]
Moreover, these bounds are also pointwise sharp:232323The proof of propositions D.3 and D.4 are located at the end of Appendix D.
Proposition D.3
Suppose that the functions , , and are point identified at every pair . Under Assumptions 1-6, 7 (sub-cases 1, 2, 3(a) or 3(b)) and D.1, the bounds and , given by Corollary D.2, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Finally, I state an impossibility result that is analogous to Proposition 13.
Proposition D.4
Suppose that the functions , , and are point identified at every pair . Impose assumptions 1-6 and D.1. If , then, for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Proof of Proposition D.3. I only prove Proposition D.3 under Assumption 7.3 (sub-cases (a) and (b)).The proofs of Proposition D.3 under assumptions 7.1 and 7.2 are trivial modifications of the proof presented below.
Fix , and arbitrarily. For brevity, define
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Note that
[TABLE]
and that
[TABLE]
The strategy of this proof consists of defining candidate random variables through their joint cumulative distribution function and then checking that equations (D.2), (D.3) and (D.4) are satisfied. I fix and define in twelve steps:
- Step 1.
For , . 2. Step 2.
From now on, consider . Since
[TABLE]
it suffices to define . Moreover, I impose
[TABLE]
by writing
[TABLE]
implying that it is sufficient to define . 3. Step 3.
For , I define . 4. Step 4.
From now on, consider . Since
[TABLE]
it suffices to define and . 5. Step 5.
I define . 6. Step 6.
For any , I define . 7. Step 7.
For any , I define . 8. Step 8.
From now on, assume that . Since
[TABLE]
it is sufficient to define and . 9. Step 9.
I define
[TABLE] 10. Step 10.
I write , implying that I can separately define and . 11. Step 11.
When and is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because .
When and is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because of equations (D.8) and (D.9). 12. Step 12.
When and is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because of equations (A.10) and (A.11).
When and is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because .
Having defined the joint cumulative distribution function , note that equations (D.8) and (D.9), the facts and , and steps 7-12 ensure that equation (D.3) holds.
Now, I show, in three steps, that equation (D.2) holds.
- Step 13.
Observe that
[TABLE] 2. Step 14.
Notice that
[TABLE] 3. Step 15.
Note that Steps 13 and 14 imply that
[TABLE]
ensuring that equation (D.2) holds.
Finally, to show that equation (D.4) holds, it suffices to follow steps 16 and 17 in Appendix A.4.1.
I can then conclude that Proposition D.3 is true.
Proof of Proposition D.4. This proof is essentially the same proof of Proposition D.3 under Assumption 7.3.(a). Fix , and arbitrarily. For brevity, define
[TABLE]
and
[TABLE]
Note that and .
I define the random variables using the joint cumulative distribution function described by steps 1-12 in the last proof for the case of convex support . Note that equation (D.6) is trivially true when . Moreover, equations (D.5) and (D.7) are valid by the argument described in the last proof.
I can then conclude that Proposition D.4 is true.
Appendix E Uninformative Bounds with Non-monotone Sample Selection
In the main text and in Appendices C and D, I impose some monotonicity condition on the sample selection problem through equation (2). However, in some empirical applications, this assumption may be invalid. For example, in the short run, a job training program may move some individuals from unemployment to employment by increasing their human capital or from employment to unemployment by decreasing their labor market experience. Since this is a frequent feature in empirical economics, it is important to understand what can be discovered about the marginal treatment effect when sample selection is not monotone. To do so, I drop equation (2) and impose equation (1), Assumptions 1-6, a small generalization of Assumption 7
Assumption E.1
I assume that and are known, and that
, and , or 2. 2.
, and is an interval, or 3. 3.
, and is an interval, or 4. 4.
, and
- (a)
* is an interval or* 2. (b)
* and .*
I also impose mild regularity conditions to ensure that all objects are well-defined:
Assumption E.2
For any and ,
[TABLE]
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Observe that conditions (E.4) and (E.5) are implied by a non-degenerate conditional distribution for each potential outcome of interest. Most importantly, the above assumptions are sufficient to construct bounds for the (Horowitz & Manski (2000)) and for the (Chen & Flores 2015, section 2.4) that are shorter than the entire support of the treatment effect.
I, now, show that, differently from the and the , the bounds for the on the outcome of interest (equation (3)) without equation (2) are uninformative, i.e., the bounds without monotone sample selection are equal to . Formally, I have that:
Proposition E.3
Suppose that the functions , , and are point identified at every pair . Impose equation (1) and assumptions 1-6 and E.1-E.2. Then, for any , and , there exist random variables such that
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , and .
Proof of Proposition E.3. I only prove Proposition E.3 under assumption E.1.4 (sub-cases (a) or (b)) because this is the more demanding case and because the other cases are trivial extensions of this one.
Fix , and arbitrarily. For brevity, define such that ,
[TABLE]
[TABLE]
Note that, by construction,
[TABLE]
The strategy of this proof consists of defining candidate random variables through their joint cumulative distribution function and then checking that equations (E.6), (E.7) and (E.8) are satisfied. I fix and define in twelve steps:
- Step 1.
For , . 2. Step 2.
From now on, consider . Since
[TABLE]
it suffices to define . Moreover, I impose
[TABLE]
by writing
[TABLE]
implying that it is sufficient to define . 3. Step 3.
For , I define . 4. Step 4.
From now on, consider . Since
[TABLE]
it suffices to define and . 5. Step 5.
I define . 6. Step 6.
For any , I define . 7. Step 7.
For any , I define . 8. Step 8.
From now on, consider . Since
[TABLE]
it is sufficient to define and . 9. Step 9.
I define by writing
[TABLE]
[TABLE]
[TABLE]
[TABLE] 10. Step 10.
I write , implying that I can separately define and . 11. Step 11.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because and . 12. Step 12.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because and .
Having defined the joint cumulative distribution function , note that steps 7-12 ensure that equation (E.7) holds.
Now, observe equation (E.6) holds because steps 11 and 12 ensure that and .
Finally, equation (E.8) holds according to the same argument described at the end of appendix A.4.1.
I can then conclude that Proposition E.3 is true.
Appendix F MTE bounds under a Mean Dominance Assumption
Here, I modify the Mean Dominance Assumption 9 by changing the direction of the inequality, i.e., I assume that:
Assumption F.1
The potential outcome when treated within the always-observed subpopulation is less than or equal to the same parameter within the observed-only-when-treated subpopulation:
[TABLE]
for any and .
Note that assumption F.1 implies that . As a consequence, by following the same steps of the proof of Corollary 14, I can derive:
Corollary F.2
Fix and arbitrarily. Suppose that the , , and are point identified.
Under assumptions 1-6, 7.1, 8 and F.1, must satisfy
[TABLE]
and
[TABLE]
Under assumptions 1-6, 7.2, 8 and F.1, must satisfy
[TABLE]
and
[TABLE]
Under assumptions 1-6, 7.3 (sub-case (a) or (b)), 8 and F.1, must satisfy
[TABLE]
and
[TABLE]
When and assumptions 1-6, 8 and F.1 hold, must satisfy
[TABLE]
and
[TABLE]
The bounds in corollary F.2 can be identified using the strategies that were described in Sections 4 and 5. Furthermore, I can derive a result similar to Proposition 15:
Proposition F.3
Suppose that the functions , , , and are point identified at every pair . Under assumptions 1-6, 8 and F.1, the bounds and , given by corollary F.2, are pointwise sharp, i.e., for any , and , there exist random variables such that
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
The proof of Proposition F.3 is symmetric to the proof of Proposition 15 (Appendix A.7).
Appendix G Sharpness and Impossibility Results with Smoothness Restrictions
In the main text, I imposed no smoothness condition on the joint distribution of . Here, I impose the following smoothness condition:
Assumption G.1
The conditional cumulative distribution functions are are continuous functions of the value of U.
As a consequence of this new assumption, Theorem 12 and Proposition 13 have to be modified to accommodate infinitesimal violations of the data restriction and to ensure that the extra model restrictions imposed by assumption G.1 are also satisfied.
Proposition G.2
Suppose that the functions , , and are point identified at every pair . Under Assumptions 1-6, 7 (sub-cases 1, 2, 3(a) or 3(b)), 8 and G.1, the bounds and , given by Corollary 11 are infinitesimally pointwise sharp, i.e., for any , , and , there exist random variables such that
[TABLE]
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
Proposition G.3
Suppose that the functions , , and are point identified at every pair . Impose Assumptions 1-6, 8 and G.1. If , then, for any , , and , there exist random variables such that
[TABLE]
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
for any , where , , , , and .
The proofs of propositions G.2 and G.3 are below. They are small modification of the previous proofs.
Proof of Proposition G.2. I only prove Proposition G.2 under Assumption 7.3 (sub-cases (a) and (b)).The proofs of Proposition G.2 under assumptions 7.1 and 7.2 are trivial modifications of the proof presented below.
Fix any , any , any and any such that . For brevity, define , and .
Note that
[TABLE]
and that
[TABLE]
The strategy of this proof consists of defining candidate random variables through their joint cumulative distribution function and then checking that conditions (G.1)-(G.5) are satisfied. I fix and define in fourteen steps:
- Step 1.
For , . 2. Step 2.
From now on, consider . Since
[TABLE]
it suffices to define . Moreover, I impose
[TABLE]
by writing
[TABLE]
implying that it is sufficient to define . 3. Step 3.
For , I define . 4. Step 4.
From now on, consider . Since
[TABLE]
it suffices to define and . 5. Step 5.
I define . 6. Step 6.
For any , I define . 7. Step 7.
For any , I define . 8. Step 8.
From now on, consider . Since
[TABLE]
it is sufficient to define and . 9. Step 9.
I define
[TABLE] 10. Step 10.
For any , I define
[TABLE]
which are valid cumulative distribution functions because a convex combination of cumulative distribution functions is a cumulative distribution function.
For any , I define
[TABLE]
which are valid cumulative distribution functions because a convex combination of cumulative distribution functions is a cumulative distribution function.
Note that is a continuous function of the value of , i.e., it satisfies restriction (G.3). 11. Step 11.
I write , implying that I can separately define and . 12. Step 12.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because . 13. Step 13.
When is a bounded interval (sub-case (a) in Assumption 7.3), I define
[TABLE]
When and (sub-case (b) in Assumption 7.3), I define
[TABLE]
which are valid cumulative distribution functions because of equations (G.11) and (G.12). 14. Step 14.
For any , I define
[TABLE]
which are valid cumulative distribution functions because a convex combination of cumulative distribution functions is a cumulative distribution function.
For any , I define
[TABLE]
which are valid cumulative distribution functions because a convex combination of cumulative distribution functions is a cumulative distribution function.
Note that is a continuous function of the value of , i.e., it satisfies restriction (G.4).
Having defined the joint cumulative distribution function , note that equations (G.11) and (G.12), and steps 7-14 ensure that equation (G.2) holds.
Now, I show, in three steps, that equation (G.1) holds.
- Step 15.
Observe that
[TABLE] 2. Step 16.
Notice that
[TABLE] 3. Step 17.
Note that
[TABLE]
ensuring that equation (G.1) holds.
Finally, I show, in four steps, that equation (G.5) holds.
- Step 18.
Fix arbitrarily and observe that expression (G.5) can be simplified to:
[TABLE] 2. Step 19.
Notice that
[TABLE] 3. Step 20.
Following the same procedure of step 19, I have that:
[TABLE] 4. Step 21.
Combining steps 19 and 20, I find that
[TABLE]
implying equation (G.5) according to equation (G.15).
I can then conclude that Proposition G.2 is true.
Proof of Proposition G.3. This proof is essentially the same proof of Proposition G.2 under Assumption 7.3.(a). Fix any , any , any and any such that . For brevity, define , and . Note that and .
I define the random variables using the joint cumulative distribution function described by steps 1-14 in the proof of Proposition G.2 for the case of convex support . Note that equation (G.7) is trivially true when . Moreover, equations (G.6) and (G.10) are valid by the argument described in steps 15-21 in the previous proof.
I can then conclude that Proposition G.3 is true.
Appendix H Monte Carlo Simulations
My empirical analysis uses two new tools in order partially identify the marginal treatment effects on wages for the always-employed population (): the sharp bounds (Section 3) and the restricted version of the parametric estimation strategy proposed Brinch et al. (2017) (Subsection 5.2). Given the novelty of these methods, it is useful to implement a Monte Carlo Simulation in order to check whether the above methods work reasonably well in finite samples. In particular, I design six data-generating processes (DGPs) that capture important features of the Job Corp Training Program (JCTP) dataset and, using 1,000 simulations, estimate the coverage rate of the confidence intervals used to analyze the wage effect of the JCTP (Section 6.3.) The first three DGPs satisfy the linearity assumptions imposed by the parametric estimation method, while the last three DGPs have non-linear marginal treatment response functions for employment and hourly labor earnings. The latter are useful to understand how my partial identification strategy behaves under model mis-specification.
In Subsection H.1, I describe each one of the six DGPs used in this Monte Carlo exercise, while, in subsection H.2, I describe the results from my simulations.
Data Generating Processes
All six data-generating processes have 7,531 observations, the same number as in the Non-Hispanic subsample of the JCTP. The dummy variable indicates treatment assignment and is equal to with probability , the same probability of a Non-Hispanic person being assigned to the treatment in my empirical application. To create the dummy variable that indicates treatment take-up, I use a random variable and the propensity score function (see Equation (1)) as and , the same values of Table 4. Although potential employment status and and potential wages and follow different distributions in each DGP, employment and wages are always independent after conditioning on the latent heterogeneity in this Monte Carlo study, i.e., \left.\left(S_{0},S_{1}\right)\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}\left(Y_{0}^{*},Y_{1}^{*}\right)\right|U for any DGP. I impose this restrictive condition so that I can easily write the marginal treatment response () function of hourly labor earnings as the product between the functions of employment and wages, i.e., for any and . Moreover, the Mean Dominance Assumption 9 holds with equality in all DGPs. Finally, there are no covariates in this simulation study since they are not used in my empirical application.
H.1.1 Design 1
Potential employment status are generated following equation (2) with , V\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}U, and , where is equal to the employment probability of a Non-Hispanic person being employed conditioning on treatment assignment in the JCTP sample. Consequently, the functions for employment are constant.
Potential wages are generated by and , where , is the average observed hourly wage of the Non-Hispanics assigned to the control group in the JCTP sample, and is the estimated lower bound on the (Table 7). Consequently, the functions for hourly wages are constant.
Since the functions for employment and hourly wages are constant, the function for hourly labor earnings is also constant.
H.1.2 Design 2
Potential employment status are generated based on Design 1.
Potential untreated wage is generated based on Design 1, while potential treated wage is generated by . Consequently, the function for treated hourly wages is linear.
Since the functions for employment are constant and the function for treated hourly wages is linear, the function for treated hourly labor earnings is linear.
H.1.3 Design 3
Potential employment status are generated to ensure that and that the true functions are equal to the estimated function in the JCTP sutdy (Table 6). Consequently, the functions for employment are linear.
Potential wages are generated based on Design 1.
Since the functions for employment are linear and the functions for hourly wages are constant, the functions for hourly labor earnings are linear.
H.1.4 Design 4
Potential employment status are generated based on Design 3.
Potential wages are generated based on Design 2.
Since the functions for employment are linear and the function for treated hourly wages is linear, the function for treated hourly labor earnings is quadratic.
H.1.5 Design 5
Potential employment status are generated following equation (2) with , and , where the parameters of the Beta distribution and the values for any are chosen so that the true functions on employment match the estimated functions on employment (Table 6) when the latent heterogeneity variable is equal to the propensity score values. Note that the true functions for employment are non-linear.
Potential wages are generated based on Design 1.
Since the functions for employment are non-linear, the functions for hourly labor earnings are non-linear.
H.1.6 Design 6
Potential employment status are generated based on Design 5.
Potential wages are generated based on Design 2.
Since the functions for employment are non-linear, the functions for hourly labor earnings are non-linear.
Monte Carlo Results
The focus of this subsection is whether the two types of confidence intervals used in the empirical application (Subsection 6.3) contain the true marginal treatment effect on wages for the always-employed population. To analyze this question, I report the pointwise coverage rate using 1,000 Monte Carlo simulations: while Figure H.1 reports the pointwise coverage rate of Bootstrap 90%-Confidence Intervals for each data-generating process, Figure H.2 reports the pointwise coverage rate of 90%-Confidence Intervals based on Imbens & Manski (2004) for each data-generating process. The solid lines are associated with bounds that do not impose the Mean Dominance Assumption 9 (Corollary 11), while the dashed lines are associated with bounds that impose the Mean Dominance Assumption 9 (Corollary 14). Since the results for the Bootstrap 90%-Confidence Intervals are very similar to the results for the 90%-Confidence Intervals based on Imbens & Manski (2004), I focus on the latter. Moreover, since the bounds that impose the Mean Dominance Assumption 9 are tighter than the ones that do not impose this assumption, I only discuss the results associated with Corollary 14.
For Designs 1 and 2 (which satisfy the linearity assumptions of the parametric estimation procedure detailed in Subsection 5.2), the coverage rate for the confidence interval proposed by Imbens & Manski (2004) is above the nominal confidence level. This finding is not surprising in light of Proposition 1 by Stoye (2009), who shows that such confidence intervals have an asymptotic coverage rate that is at least the nominal confidence level.
For Design 3, I find a surprising negative result. Even though the functions are linear for this DGP, the coverage rate is below the nominal confidence level for many values of the latent heterogeneity. A even more surprising but positive result is the coverage rate for Design 4. Although the function for treated hourly labor earnings is quadratic for this DGP, the coverage rate is above the nominal confidence level for most values of the latent heterogeneity.
Finally, for Designs 5 and 6, I find that the 90%-Confidence Intervals based on Imbens & Manski (2004) severely under-cover the true function for most values of the latent heterogeneity. This negative result is not surprising because the functions of those DGPs are not linear.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Ahn & Powell (1993) Ahn, H. & Powell, J. L. (1993), ‘Semiparametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism’, Journal of Econometrics 58 , pp. 3–29.
- 3Altonji (1993) Altonji, J. (1993), ‘The Demand for and Return to Education When Education Outcomes are Uncertain’, Journal of Labor Economics 11 (1), pp. 48–83.
- 4Angelucci et al. (2015) Angelucci, M., Karlan, D. & Zinman, J. (2015), ‘Microcredit Impacts: Evidence from a Randomized Microcredit Program Placement Experiment by Comportamos Banco’, American Economic Journal: Applied Economics 7 (1), pp. 151–182.
- 5Angrist et al. (2006) Angrist, J., Bettinger, E. & Kremer, M. (2006), ‘Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia’, The American Economic Review 96 (3), 847–862. http://www.jstor.org/stable/30034075
- 6Angrist et al. (2009) Angrist, J., Lang, D. & Oreopoulos, P. (2009), ‘Incentives and Services for College Achievement: Evidence from a Randomized Trial’, American Economic Journal: Applied Economics 1 (1), pp. 1–28.
- 7Behaghel et al. (2015) Behaghel, L., Crepon, B., Gurgand, M. & Barbanchon, T. L. (2015), ‘Please Call Again: Correcting Nonresponse Bias in Treatment Effect Models’, The Review of Economics and Statistics 97 (5), pp. 1070–1080.
- 8Bhattacharya et al. (2008) Bhattacharya, J., Shaikh, A. M. & Vytlacil, E. (2008), ‘Treatment Effect Bounds under Monotonicity Assumptions: An Application to Swan-Ganz Catheterization’, The American Economic Review: Papers and Proceedings 98 (2), pp. 351–356.
