Semiparametric estimation of heterogeneous treatment effects under the   nonignorable assignment condition

Keisuke Takahata; Takahiro Hoshino

arXiv:1902.09978·econ.EM·February 27, 2019

Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition

Keisuke Takahata, Takahiro Hoshino

PDF

Open Access

TL;DR

This paper introduces a semiparametric two-stage least squares estimator for heterogeneous treatment effects, addressing the ill-posed nature of the integral equations involved by using orthogonal series approximation to ensure stability.

Contribution

The paper develops a novel semiparametric estimator for HTE that stabilizes solutions to ill-posed integral equations using orthogonal series, improving estimation under nonignorable assignment.

Findings

01

Estimator performs well in simulation experiments.

02

Addresses ill-posedness in integral equations for HTE.

03

Provides a stable solution to a challenging estimation problem.

Abstract

We propose a semiparametric two-stage least square estimator for the heterogeneous treatment effects (HTE). HTE is the solution to certain integral equation which belongs to the class of Fredholm integral equations of the first kind, which is known to be ill-posed problem. Naive semi/nonparametric methods do not provide stable solution to such problems. Then we propose to approximate the function of interest by orthogonal series under the constraint which makes the inverse mapping of integral to be continuous and eliminates the ill-posedness. We illustrate the performance of the proposed estimator through simulation experiments.

Tables1

Table 1. Table 1 . Estimation of ATE (true: 0.900)

$B_{γ}$	mean	s.d.
10	0.888	0.0243
15	0.884	0.0260
25	0.881	0.0298
50	0.877	0.0397

Equations53

HTE (y_{0}) = E [y_{1} - y_{0} ∣ y_{0}],

HTE (y_{0}) = E [y_{1} - y_{0} ∣ y_{0}],

E [y_{1} - y_{0} ∣ y_{0}] = E [y_{1} ∣ y_{0}] - y_{0} = E_{x ∣ y_{0}} [E [y_{1} ∣ y_{0}, x]] - y_{0},

E [y_{1} - y_{0} ∣ y_{0}] = E [y_{1} ∣ y_{0}] - y_{0} = E_{x ∣ y_{0}} [E [y_{1} ∣ y_{0}, x]] - y_{0},

E [y_{1} - y_{0}] = E_{y_{0}, x} [y_{1} ∣ y_{0}, x] - E [y_{0}] .

E [y_{1} - y_{0}] = E_{y_{0}, x} [y_{1} ∣ y_{0}, x] - E [y_{0}] .

E [y_{1} ∣ x, z = 1]

E [y_{1} ∣ x, z = 1]

= \int E [y_{1} ∣ y_{0}, x] p (y_{0} ∣ x, z = 1) d y_{0},

ϕ (y_{0}, x) ≃ j_{1} = 0 \sum J j_{2} = 0 \sum J γ_{j_{1} j_{2}} q_{j_{1}} (y_{0}) q_{j_{2}} (x) .

ϕ (y_{0}, x) ≃ j_{1} = 0 \sum J j_{2} = 0 \sum J γ_{j_{1} j_{2}} q_{j_{1}} (y_{0}) q_{j_{2}} (x) .

g (k_{0} + k_{y_{0}} (y_{0}) + k_{x} (x)) = \frac{1}{1 + exp ( - ( k _{0} + k _{y_{0}} ( y _{0} ) + k _{x} ( x )))},

g (k_{0} + k_{y_{0}} (y_{0}) + k_{x} (x)) = \frac{1}{1 + exp ( - ( k _{0} + k _{y_{0}} ( y _{0} ) + k _{x} ( x )))},

p (y_{0} ∣ x, z = 1)

p (y_{0} ∣ x, z = 1)

= {\frac{exp ( k _{0} + k _{x} ( x )) p ( z = 0 )}{p ( x , z = 1 )}} exp (k_{y_{0}} (y_{0})) p (y_{0}, x ∣ z = 0)

= c (x) exp (k_{y_{0}} (y_{0})) p (y_{0}, x ∣ z = 0),

E [y_{1} ∣ x, z = 1]

E [y_{1} ∣ x, z = 1]

≃

=

=

\overset{p}{^} (y_{0}, x ∣ z = 0) = \frac{1}{N _{0}} i : z_{i} = 0 \sum \frac{1}{h _{y_{0}} h _{x}} K (\frac{y _{0} - y _{i 0}}{h _{y_{0}}}) K (\frac{x - x _{i}}{h _{x}}) .

\overset{p}{^} (y_{0}, x ∣ z = 0) = \frac{1}{N _{0}} i : z_{i} = 0 \sum \frac{1}{h _{y_{0}} h _{x}} K (\frac{y _{0} - y _{i 0}}{h _{y_{0}}}) K (\frac{x - x _{i}}{h _{x}}) .

\overset{s}{^}_{j_{1}} (x)

\overset{s}{^}_{j_{1}} (x)

= \frac{1}{N _{0}} i : z_{i} = 0 \sum \frac{1}{h ^ _{y_{0}} h ^ _{x}} K (\frac{x - x _{i}}{h ^ _{x}}) \hat{t}_{j_{1}} (y_{i 0})

\overset{p}{^} (x ∣ z = 1) = \frac{1}{N _{1}} i : z_{i} = 1 \sum \frac{1}{w ^ _{x}} K (\frac{x - x _{i}}{w ^ _{x}}) .

\overset{p}{^} (x ∣ z = 1) = \frac{1}{N _{1}} i : z_{i} = 1 \sum \frac{1}{w ^ _{x}} K (\frac{x - x _{i}}{w ^ _{x}}) .

E [y_{1} ∣ x, z = 1] = \overset{c}{^} (x) j_{1} = 0 \sum J j_{2} = 0 \sum J γ_{j_{1} j_{2}} \overset{s}{^}_{j_{1}} (x) q_{j_{2}} (x),

E [y_{1} ∣ x, z = 1] = \overset{c}{^} (x) j_{1} = 0 \sum J j_{2} = 0 \sum J γ_{j_{1} j_{2}} \overset{s}{^}_{j_{1}} (x) q_{j_{2}} (x),

\overset{γ}{^}_{j_{1} j_{2}}

\overset{γ}{^}_{j_{1} j_{2}}

s.t. γ^{'} Λ_{J} γ \leq B_{γ},

s.t. γ^{'} Λ_{J} γ \leq B_{γ},

\hat{E} [y_{1} ∣ y_{0}]

\hat{E} [y_{1} ∣ y_{0}]

= \int \hat{ϕ} (y_{0}, x) \overset{p}{^} (x ∣ y_{0}) d x .

p (x ∣ y_{0}) = \frac{p ( y _{0} , x )}{p ( y _{0} )} = \frac{p ( y _{0} , x ∣ z = 0 ) p ( z = 0 )}{p ( z = 0∣ y _{0} , x ) p ( y _{0} )} .

p (x ∣ y_{0}) = \frac{p ( y _{0} , x )}{p ( y _{0} )} = \frac{p ( y _{0} , x ∣ z = 0 ) p ( z = 0 )}{p ( z = 0∣ y _{0} , x ) p ( y _{0} )} .

m (x, y_{0}) = (x - E [x], y_{0} - E [y_{0}], y_{0}^{2} - E [y_{0}^{2}])^{'} .

m (x, y_{0}) = (x - E [x], y_{0} - E [y_{0}], y_{0}^{2} - E [y_{0}^{2}])^{'} .

E [\frac{m ( x , y _{0} )}{p ( z = 0∣ y _{0} , x )} z = 0]

E [\frac{m ( x , y _{0} )}{p ( z = 0∣ y _{0} , x )} z = 0]

= p (z = 0) \int m (x, y_{0}) p (y_{0}, x) d y_{0} d x

= p (z = 0) E [m (x, y_{0})] = 0.

\frac{1}{N _{0}} i : z_{i} = 0 \sum m (x_{i}, y_{i 0}) (1 + exp (k_{0} + k_{x} (x_{i}) + k_{y_{0}} (y_{i 0}))) = 0

\frac{1}{N _{0}} i : z_{i} = 0 \sum m (x_{i}, y_{i 0}) (1 + exp (k_{0} + k_{x} (x_{i}) + k_{y_{0}} (y_{i 0}))) = 0

i : z_{i} = 0 \sum (1 + exp (k_{0} + k_{x} (x_{i}) + k_{y_{0}} (y_{i 0}))) = N,

x \sim N (0, 1), (y_{0} y_{1}) x \sim N ((μ_{0} (x) μ_{1} (x)), (σ_{0}^{2} ρ σ_{0} σ_{1} ρ σ_{0} σ_{1} σ_{1}^{2}))

x \sim N (0, 1), (y_{0} y_{1}) x \sim N ((μ_{0} (x) μ_{1} (x)), (σ_{0}^{2} ρ σ_{0} σ_{1} ρ σ_{0} σ_{1} σ_{1}^{2}))

q_{0} (v) = 1, q_{1} (v) = v, q_{2} (v) = \frac{1}{2} (3 v^{2} - 1), q_{3} (v) = \frac{1}{2} (5 v^{3} - 3 v) .

q_{0} (v) = 1, q_{1} (v) = v, q_{2} (v) = \frac{1}{2} (3 v^{2} - 1), q_{3} (v) = \frac{1}{2} (5 v^{3} - 3 v) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials

Full text

Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition

Keisuke Takahata1),2)

and

Takahiro Hoshino1),2)

Abstract.

We propose a semiparametric two-stage least square estimator for the heterogeneous treatment effects (HTE). HTE is the solution to certain integral equation which belongs to the class of Fredholm integral equations of the first kind, which is known to be ill-posed problem. Naive semi/nonparametric methods do not provide stable solution to such problems. Then we propose to approximate the function of interest by orthogonal series under the constraint which makes the inverse mapping of integral to be continuous and eliminates the ill-posedness. We illustrate the performance of the proposed estimator through simulation experiments.

Keyword: semiparametric estimation; integral equation; heterogeneity in treatment effects

Department of Economics, Keio University, Tokyo, Japan; 2) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

1. Introduction

In causal inference, treatment effects such as average treatment effect (ATE) or average treatment effect on the treated (ATT) have been of primary interest in the literature (Rubin, 1974). These parameters are, as the names stand for, an averaged treatment effect over a population of interest. However, a treatment effect may differ among units depending on the covariates and outcomes. Such heterogeneity of treatment effects has been intensively studied in recent years. For example, Wager and Athey (2018) proposed a method for finding subgroups in which the treatment effect is similar using the random forest. Understanding heterogeneity of treatment effects aids not only to more detailed analysis of a population of interest but also to a more efficient policy-making where an intervention is costly.

While most studies have concerned heterogeneity over the covariates which are fully observed, Takahata and Hoshino (2018) (henceforth TH) studied the heterogeneity over the untreated potential outcome, which is defined as

[TABLE]

where $y_{1}$ and $y_{0}$ are the outcome when receiving the treatment and control condition respectively. Following TH, we call (1) as the heterogeneous treatment effect (HTE), which is also of interest in this paper. To estimate the HTE, we have to deal with non-ignorable missingness because we need to estimate $p(y_{1}|y_{0})$ but $y_{1}$ and $y_{0}$ are never observed simultaneously. It is known that identification is not trivial in non-ignorable missing models (e.g., Miao et al. (2016)). TH provided the sufficient condition for the identification of the HTE using the information of the marginal distribution of $y_{0}$ , $p(y_{0})$ . Although the identification condition is rather general, estimation of the HTE is difficult in that it is necessary to solve some integral equation; the integral equation that we need to consider is a Fredholm integral equation of the first kind, which is known to be a ill-posed problem. In general, appropriate regularization methods are needed to obtain a stable solution to such equation. In TH, this problem was avoided using a parametric Bayesian modeling, but for wide applicability, a more flexible approaches is desired.

In this paper, we propose a semiparametric two-stage least square (2SLS) estimator for the HTE. Our approach relies on the quadratic programming method proposed by Newey and Powell (2003), in which they concerned the estimation of a nonparametric instrumental variable model. The function of interest is approximated by series of a finite number and then the integral equation reduces to a constrained least square problem with the regressors replaced by its expectation. To overcome instability of the solution due to the ill-posedness, certain bounds are placed on the coefficients of the series to make the inverse mapping of integral to be continuous. The numerical experiments show that the proposed method correctly estimate the HTE.

2. Model setup

We follow the same setup as TH. The HTE (eq. (1)) is rewritten as

[TABLE]

where $x\in\mathbb{R}^{d}$ is $d$ -dimensional covariate. From this formula we observe that, for the identification of the HTE, it is sufficient to identify $p(x|y_{0})$ and $p(y_{1}|y_{0},x)$ . Let $z\in\left\{0,1\right\}$ be the binary indicator, which is equal to 1 when assigned to the treatment condition. If $z=1$ , then only $y_{1}$ is observed and $y_{0}$ is missing, and vice versa. TH showed that the following two assumptions play a primary role for the identification:

**(A1): **

$p(z|y_{1},y_{0},x)=p(z|y_{0},x)$ (weak ignorability);

**(A2): **

$p(y_{0})$ is known.

Assumption (A1) is called weak ignorability. Intuitively, this assumption implies that we can identify the HTE by observing the difference of two groups in which the assignment probability depends on $y_{0}$ . Therefore, in a situation where strong ignorability, $p(z|y_{1},y_{0},x)=p(z|x)$ , is satisfied, our approach is not applicable. Assumption (A2) is needed for identify $p(z|y_{0},x)$ (Hirano et al., 2001). In addition to these conditions, several constraints on the functional form and parameter space of $p(z|y_{0},x)$ is needed; for more detailed discussion on the identification condition, see TH. In what follows, we suppose that all the conditions mentioned in Theorem 2 in TH are satisfied. Note that, in this setup, the identification of ATE is trivial because

[TABLE]

Consider the integral equation

[TABLE]

where the second equality holds by weak ignorability. Under the identification condition, it can be proved that the solution to eq. (2) for $\mathbb{E}\mathopen{}\left[y_{1}|y_{0},x\mathopen{}\right]$ is unique, that is, $\mathbb{E}\mathopen{}\left[y_{1}|y_{0},x\mathopen{}\right]$ is identified. Then our goal is to obtain an actual solution to eq. (2). However, if we employ a nonparametric method for estimating $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0},x\mathopen{}\right]$ , a solution suffers from the instability due to the discontinuity of the inverse mapping of integral. Then we need to take an appropriate regularization method to overcome the ill-posedness of eq. (2). We address this problem in the next section.

3. Estimation

In this section we propose a two-stage least square estimator (2SLSE) for $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0},x\mathopen{}\right]$ based on Newey and Powell (2003)’s method. The strategy is that (i) we approximate $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0},x\mathopen{}\right]$ by a finite number of orthogonal basis functions, (ii) take expectation of them with respect to $p(y_{0}|x,z=1)$ , and (iii) do least square estimation under the constraint to make the inverse mapping of the integral to be continuous and eliminate the ill-posedness of eq. (2). For simplicity we suppose $x\in\mathbb{R}$ in the rest of the paper.

3.1. Estimation of $\mathbb{E}\mathopen{}\left[y_{1}|y_{0},x\mathopen{}\right]$

We consider approximating $\phi(y_{0},x)\equiv\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0},x\mathopen{}\right]$ with a finite number of orthogonal basis functions, $\{q_{j}\}_{j=0}^{J}$ , as

[TABLE]

We specify $p(z=1|y_{0},x)$ by the logistic regression such that it satisfies the identification condition from TH:

[TABLE]

where the additivity holds between $y_{0}$ and $x$ . Expand $p(y_{0}|x,z=1)$ as

[TABLE]

where $c(x)=\exp(k_{0}+k_{x}(x))p(z=0)/p(x,z=1)$ . Plugging eq. (3) and eq. (5) into eq. (2) yields

[TABLE]

where $s_{j_{1}}(x)=\int q_{j_{1}}(y_{0})\exp(k_{y_{0}}(y_{0}))p(y_{0},x|z=0)dy_{0}$ . The estimation of the missing mechanism will be discussed later.

Here, we consider estimating $p(y_{0},x|z=0)$ by the kernel density estimator,

[TABLE]

where $h_{y_{0}}$ and $h_{x}$ are the bandwidths and $N_{0}$ is the sample size of the control group. In this case, $s_{j_{1}}(x)$ can be estimated by

[TABLE]

where $\hat{t}_{j_{1}}(y_{i0})=\int q_{j_{1}}(y_{0})\exp(\hat{k}_{y_{0}}(y_{0}))K\left(\frac{y_{0}-y_{i0}}{\hat{h}_{y_{0}}}\right)dy_{0}$ . Similarly, we obtain an estimator for $c(x)$ , $\hat{c}(x)$ , by the kernel density estimator,

[TABLE]

By inserting eq. (7) and eq. (8) into eq. (6), we obtain

[TABLE]

where $N_{1}$ is the sample size of the treatment group. Therefore, the least square estimator for $\gamma_{j_{1}j_{2}}$ is obtained by the following quadratic problem:

[TABLE]

where $\gamma=(\gamma_{11},\gamma_{12},\dots,\gamma_{j_{1}j_{2}},\dots,\gamma_{JJ})^{\prime}$ , $B_{\gamma}$ is a positive constant. $\Lambda_{J}$ is the Sobolev norm of the basis functions, $\left\{q_{j_{1}}(\cdot)q_{j_{2}}(\cdot)\right\}_{(j_{1},j_{2})}\,(j_{1}=0,\dots,J,\,j_{2}=0,\dots,J)$ , which imposes compactness on both the true and estimated functions of $\phi(y_{0},x)$ . This compactness eliminates the ill-posedness of the inverse problem (eq. (2)) because the inverse operator of the integral becomes continuous mapping (Newey and Powell, 2003).

Finally, by integrating out $x$ in $\hat{\phi}$ , we obtain $\hat{E}[y_{1}|y_{0}]$ :

[TABLE]

Note that we can calculate $\hat{p}(x|y_{0})$ by plugging corresponding estimators into the following formula:

[TABLE]

3.2. Estimation of the missing mechanism

We estimate the missing mechanism (eq. (4)) referring to Nevo (2003), who proposed a generalized method of moments (GMM) estimator for a nonignorable missing model. Suppose that the auxiliary moment condition, $\mathbb{E}\mathopen{}\left[m(x,y_{0})\mathopen{}\right]=0$ , is available. From assumption (A2), we can calculate any moments of $y_{0}$ up to infinite dimension, but for simplicity, the dimension of the moment condition is set to be equal to the sum of the dimension of the parameters of $k_{y_{0}}(\cdot)$ and $k_{x}(\cdot)$ . For example, if $k_{x}(x)=\beta_{0}x$ and $k_{y_{0}}(y_{0})=\beta_{1}y_{0}+\beta_{2}y_{0}^{2}$ , then we may set the moment function as

[TABLE]

Note that

[TABLE]

Therefore, the solution of the following system of equations,

[TABLE]

is an unbiased estimator for $(k_{0},\beta_{0},\beta_{1},\beta_{2})$ . $N$ is the total sample size. The second equation implies the normalization of the weights, which is due to $1/N\cdot\sum_{i:z_{i}=0}1/p(z_{i}=0|y_{i0},x_{i})\stackrel{{\scriptstyle p}}{{\longrightarrow}}1$ . For detailed identification conditions, see Nevo (2003).

4. Simulation

We conduct simulations to examine the performance of the estimator shown in the previous section. Each data set is generated from

[TABLE]

where $\sigma_{0}=1/5$ , $\sigma_{1}=1/2$ , $\rho=1/2$ , $\mu_{0}(x)=-3x/5-1/10$ , $\mu_{1}(x)=-(x-1)^{2}/10+1$ and the sample size is $N=3000$ (Figure 1, 2). In this study, we use Legendre polynomials up to the third-order ( $J=3$ ) as the basis functions in eq. (3):

[TABLE]

Then, we make an appropriate linear transformation so that each variable is included in $[-1,1]$ because Legendre polynomials are orthogonal on this interval (hereafter, the variables denote the transformed values. After estimation, the inverse transformation is made to calculate ATE). We specify the functions in the missing mechanism (eq. (4)) as $k_{x}(x)=\beta_{0}x$ and $k_{y_{0}}(y_{0})=\beta_{1}y_{0}+\beta_{2}y_{0}^{2}$ and $z_{i}$ is drawn from $\mathrm{B}(1,p_{i})$ where $p_{i}=g(k_{0}+k_{y_{0}}(y_{i0})+k_{x}(x_{i}))$ . We set $k_{0}=-3/2$ , $\beta_{0}=-2$ , $\beta_{1}=-2$ , $\beta_{2}=1$ and the mean of the probability of being assigned to the treatment group becomes about $30\%$ . For the units where $z_{i}=1$ , only $y_{i1}$ and $x_{i}$ are used for estimation, and conversely, $y_{i0}$ and $x_{i}$ are used where $z_{i}=0$ . Bandwidths for the kernel density estimators are chosen by Scott’s normal reference rule of thumb. For estimation, we first estimate the missing mechanism, and then $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0},x\mathopen{}\right]$ given the former estimator.

Figure 3 shows the result of the estimation of $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0}\mathopen{}\right]$ . The horizontal axis is $y_{0}$ and the vertical axis is $y_{1}$ . The dashed line shows the theoretical value of $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0}\mathopen{}\right]$ . The solid line and the gray region show the mean of the estimator and the 90% confidence interval from 1000 replications respectively. As the figure shows, the performance of the estimator is substantially influenced by $B_{\gamma}$ ; if we set a small value as $B_{\gamma}$ , the variance of the estimator also becomes small, whereas the confidence interval may not include the true curve and the expectation of the estimator may be apart from it. This problem is particularly serious on the edge, where only a small number of samples is observed. However, it is notable that we can estimate $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0}\mathopen{}\right]$ to some extent although none of the pair $(y_{i1},y_{i0})$ is observed and there are large overlaps in the distributions (see Figure 2).

Table 1 shows the result of estimating ATE by integrating out $y_{0}$ in $\hat{E}[y_{1}|y_{0}]$ . The theoretical value of the ATE in our setting is 0.900. As the estimation of $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0}\mathopen{}\right]$ , the variance becomes larger as $B_{\gamma}$ becomes larger. On the other hand, the mean of the estimator for the ATE gets closer to the true value when $B_{\gamma}$ is small. Although we recognize its importance, how to determine $B_{\gamma}$ is beyond the scope of this paper.

5. Concluding remarks

We proposed a semiparametric two-stage least square estimator for the HTE and examine its properties through a simple simulation study, showing the availability of estimating $\mathbb{E}\mathopen{}\left[y_{1}\middle|y_{0}\mathopen{}\right]$ even though none of the pair $(y_{1},y_{0})$ is observed. As mentioned in Section 4, the performance of the estimator shown in this paper is influenced substantially by the constraint parameter, which we have to tune. In addition, although we use Legendre polynomials up to the third-order in the simulation, the order actually needs to be determined reflecting the characteristics of the target population. Although there is literature on this issue (e.g., Horowitz (2014)), a decisive method has not been developed. More importantly, our approach would not work for a multivariate case because it uses the kernel density estimator, that is, our method suffers from the curse of dimensionality. Similarly, an approximation using the orthogonal basis functions as eq. (3) would be a problem because the number of parameters grows with $J^{d}$ (“approximation order" to the power “the number of dimensions of covariates") rate. Considering this issue, a nonparametric Bayesian approach may be helpful. We are planning to address this in future work.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Hirano et al. (2001) Hirano, Keisuke, Guido W. Imbens, Geert Ridder, and Donald B. Rubin (2001) ‘Combining Panel Data Sets with Attrition and Refreshment Samples.’ Econometrica 69(6), 1645–1659
3Horowitz (2014) Horowitz, Joel L. (2014) ‘Adaptive nonparametric instrumental variables estimation: Empirical choice of the regularization parameter.’ Journal of Econometrics 180(2), 158–173
4Miao et al. (2016) Miao, Wang, Peng Ding, and Zhi Geng (2016) ‘Identifiability of Normal and Normal Mixture Models with Nonignorable Missing Data.’ Journal of the American Statistical Association 111(516), 1673–1683
5Nevo (2003) Nevo, Aviv (2003) ‘Using Weights to Adjust for Sample Selection When Auxiliary Information Is Available.’ Journal of Business & Economic Statistics 21(1), 43–52
6Newey and Powell (2003) Newey, Whitney K., and James L. Powell (2003) ‘Instrumental Variable Estimation of Nonparametric Models.’ Econometrica 71(5), 1565–1578
7Rubin (1974) Rubin, Donald B. (1974) ‘Estimating causal effects of treatments in randomized and nonrandomized studies.’ Journal of Educational Psychology 66(5), 688–701
8Takahata and Hoshino (2018) Takahata, Keisuke, and Takahiro Hoshino (2018) ‘Identification and Estimation of Heterogeneous Treatment Effects under Non-compliance or Non-ignorable assignment.’ ar Xiv:1808.03750 [stat]. ar Xiv: 1808.03750

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition

Abstract.

1. Introduction

2. Model setup

3. Estimation

3.1. Estimation of E[y1∣y0,x]\mathbb{E}\mathopen{}\left[y_{1}|y_{0},x\mathopen{}\right]E[y1​∣y0​,x]

3.2. Estimation of the missing mechanism

4. Simulation

5. Concluding remarks

3.1. Estimation of $\mathbb{E}\mathopen{}\left[y_{1}|y_{0},x\mathopen{}\right]$