Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition
Keisuke Takahata, Takahiro Hoshino

TL;DR
This paper introduces a semiparametric two-stage least squares estimator for heterogeneous treatment effects, addressing the ill-posed nature of the integral equations involved by using orthogonal series approximation to ensure stability.
Contribution
The paper develops a novel semiparametric estimator for HTE that stabilizes solutions to ill-posed integral equations using orthogonal series, improving estimation under nonignorable assignment.
Findings
Estimator performs well in simulation experiments.
Addresses ill-posedness in integral equations for HTE.
Provides a stable solution to a challenging estimation problem.
Abstract
We propose a semiparametric two-stage least square estimator for the heterogeneous treatment effects (HTE). HTE is the solution to certain integral equation which belongs to the class of Fredholm integral equations of the first kind, which is known to be ill-posed problem. Naive semi/nonparametric methods do not provide stable solution to such problems. Then we propose to approximate the function of interest by orthogonal series under the constraint which makes the inverse mapping of integral to be continuous and eliminates the ill-posedness. We illustrate the performance of the proposed estimator through simulation experiments.
| mean | s.d. | |
|---|---|---|
| 10 | 0.888 | 0.0243 |
| 15 | 0.884 | 0.0260 |
| 25 | 0.881 | 0.0298 |
| 50 | 0.877 | 0.0397 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials
Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition
Keisuke Takahata1),2)
and
Takahiro Hoshino1),2)
Abstract.
We propose a semiparametric two-stage least square estimator for the heterogeneous treatment effects (HTE). HTE is the solution to certain integral equation which belongs to the class of Fredholm integral equations of the first kind, which is known to be ill-posed problem. Naive semi/nonparametric methods do not provide stable solution to such problems. Then we propose to approximate the function of interest by orthogonal series under the constraint which makes the inverse mapping of integral to be continuous and eliminates the ill-posedness. We illustrate the performance of the proposed estimator through simulation experiments.
Keyword: semiparametric estimation; integral equation; heterogeneity in treatment effects
- Department of Economics, Keio University, Tokyo, Japan; 2) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
1. Introduction
In causal inference, treatment effects such as average treatment effect (ATE) or average treatment effect on the treated (ATT) have been of primary interest in the literature (Rubin, 1974). These parameters are, as the names stand for, an averaged treatment effect over a population of interest. However, a treatment effect may differ among units depending on the covariates and outcomes. Such heterogeneity of treatment effects has been intensively studied in recent years. For example, Wager and Athey (2018) proposed a method for finding subgroups in which the treatment effect is similar using the random forest. Understanding heterogeneity of treatment effects aids not only to more detailed analysis of a population of interest but also to a more efficient policy-making where an intervention is costly.
While most studies have concerned heterogeneity over the covariates which are fully observed, Takahata and Hoshino (2018) (henceforth TH) studied the heterogeneity over the untreated potential outcome, which is defined as
[TABLE]
where and are the outcome when receiving the treatment and control condition respectively. Following TH, we call (1) as the heterogeneous treatment effect (HTE), which is also of interest in this paper. To estimate the HTE, we have to deal with non-ignorable missingness because we need to estimate but and are never observed simultaneously. It is known that identification is not trivial in non-ignorable missing models (e.g., Miao et al. (2016)). TH provided the sufficient condition for the identification of the HTE using the information of the marginal distribution of , . Although the identification condition is rather general, estimation of the HTE is difficult in that it is necessary to solve some integral equation; the integral equation that we need to consider is a Fredholm integral equation of the first kind, which is known to be a ill-posed problem. In general, appropriate regularization methods are needed to obtain a stable solution to such equation. In TH, this problem was avoided using a parametric Bayesian modeling, but for wide applicability, a more flexible approaches is desired.
In this paper, we propose a semiparametric two-stage least square (2SLS) estimator for the HTE. Our approach relies on the quadratic programming method proposed by Newey and Powell (2003), in which they concerned the estimation of a nonparametric instrumental variable model. The function of interest is approximated by series of a finite number and then the integral equation reduces to a constrained least square problem with the regressors replaced by its expectation. To overcome instability of the solution due to the ill-posedness, certain bounds are placed on the coefficients of the series to make the inverse mapping of integral to be continuous. The numerical experiments show that the proposed method correctly estimate the HTE.
2. Model setup
We follow the same setup as TH. The HTE (eq. (1)) is rewritten as
[TABLE]
where is -dimensional covariate. From this formula we observe that, for the identification of the HTE, it is sufficient to identify and . Let be the binary indicator, which is equal to 1 when assigned to the treatment condition. If , then only is observed and is missing, and vice versa. TH showed that the following two assumptions play a primary role for the identification:
**(A1): **
(weak ignorability);
**(A2): **
is known.
Assumption (A1) is called weak ignorability. Intuitively, this assumption implies that we can identify the HTE by observing the difference of two groups in which the assignment probability depends on . Therefore, in a situation where strong ignorability, , is satisfied, our approach is not applicable. Assumption (A2) is needed for identify (Hirano et al., 2001). In addition to these conditions, several constraints on the functional form and parameter space of is needed; for more detailed discussion on the identification condition, see TH. In what follows, we suppose that all the conditions mentioned in Theorem 2 in TH are satisfied. Note that, in this setup, the identification of ATE is trivial because
[TABLE]
Consider the integral equation
[TABLE]
where the second equality holds by weak ignorability. Under the identification condition, it can be proved that the solution to eq. (2) for is unique, that is, is identified. Then our goal is to obtain an actual solution to eq. (2). However, if we employ a nonparametric method for estimating , a solution suffers from the instability due to the discontinuity of the inverse mapping of integral. Then we need to take an appropriate regularization method to overcome the ill-posedness of eq. (2). We address this problem in the next section.
3. Estimation
In this section we propose a two-stage least square estimator (2SLSE) for based on Newey and Powell (2003)’s method. The strategy is that (i) we approximate by a finite number of orthogonal basis functions, (ii) take expectation of them with respect to , and (iii) do least square estimation under the constraint to make the inverse mapping of the integral to be continuous and eliminate the ill-posedness of eq. (2). For simplicity we suppose in the rest of the paper.
3.1. Estimation of
We consider approximating with a finite number of orthogonal basis functions, , as
[TABLE]
We specify by the logistic regression such that it satisfies the identification condition from TH:
[TABLE]
where the additivity holds between and . Expand as
[TABLE]
where . Plugging eq. (3) and eq. (5) into eq. (2) yields
[TABLE]
where . The estimation of the missing mechanism will be discussed later.
Here, we consider estimating by the kernel density estimator,
[TABLE]
where and are the bandwidths and is the sample size of the control group. In this case, can be estimated by
[TABLE]
where . Similarly, we obtain an estimator for , , by the kernel density estimator,
[TABLE]
By inserting eq. (7) and eq. (8) into eq. (6), we obtain
[TABLE]
where is the sample size of the treatment group. Therefore, the least square estimator for is obtained by the following quadratic problem:
[TABLE]
[TABLE]
where , is a positive constant. is the Sobolev norm of the basis functions, , which imposes compactness on both the true and estimated functions of . This compactness eliminates the ill-posedness of the inverse problem (eq. (2)) because the inverse operator of the integral becomes continuous mapping (Newey and Powell, 2003).
Finally, by integrating out in , we obtain :
[TABLE]
Note that we can calculate by plugging corresponding estimators into the following formula:
[TABLE]
3.2. Estimation of the missing mechanism
We estimate the missing mechanism (eq. (4)) referring to Nevo (2003), who proposed a generalized method of moments (GMM) estimator for a nonignorable missing model. Suppose that the auxiliary moment condition, , is available. From assumption (A2), we can calculate any moments of up to infinite dimension, but for simplicity, the dimension of the moment condition is set to be equal to the sum of the dimension of the parameters of and . For example, if and , then we may set the moment function as
[TABLE]
Note that
[TABLE]
Therefore, the solution of the following system of equations,
[TABLE]
is an unbiased estimator for . is the total sample size. The second equation implies the normalization of the weights, which is due to . For detailed identification conditions, see Nevo (2003).
4. Simulation
We conduct simulations to examine the performance of the estimator shown in the previous section. Each data set is generated from
[TABLE]
where , , , , and the sample size is (Figure 1, 2). In this study, we use Legendre polynomials up to the third-order () as the basis functions in eq. (3):
[TABLE]
Then, we make an appropriate linear transformation so that each variable is included in because Legendre polynomials are orthogonal on this interval (hereafter, the variables denote the transformed values. After estimation, the inverse transformation is made to calculate ATE). We specify the functions in the missing mechanism (eq. (4)) as and and is drawn from where . We set , , , and the mean of the probability of being assigned to the treatment group becomes about . For the units where , only and are used for estimation, and conversely, and are used where . Bandwidths for the kernel density estimators are chosen by Scott’s normal reference rule of thumb. For estimation, we first estimate the missing mechanism, and then given the former estimator.
Figure 3 shows the result of the estimation of . The horizontal axis is and the vertical axis is . The dashed line shows the theoretical value of . The solid line and the gray region show the mean of the estimator and the 90% confidence interval from 1000 replications respectively. As the figure shows, the performance of the estimator is substantially influenced by ; if we set a small value as , the variance of the estimator also becomes small, whereas the confidence interval may not include the true curve and the expectation of the estimator may be apart from it. This problem is particularly serious on the edge, where only a small number of samples is observed. However, it is notable that we can estimate to some extent although none of the pair is observed and there are large overlaps in the distributions (see Figure 2).
Table 1 shows the result of estimating ATE by integrating out in . The theoretical value of the ATE in our setting is 0.900. As the estimation of , the variance becomes larger as becomes larger. On the other hand, the mean of the estimator for the ATE gets closer to the true value when is small. Although we recognize its importance, how to determine is beyond the scope of this paper.
5. Concluding remarks
We proposed a semiparametric two-stage least square estimator for the HTE and examine its properties through a simple simulation study, showing the availability of estimating even though none of the pair is observed. As mentioned in Section 4, the performance of the estimator shown in this paper is influenced substantially by the constraint parameter, which we have to tune. In addition, although we use Legendre polynomials up to the third-order in the simulation, the order actually needs to be determined reflecting the characteristics of the target population. Although there is literature on this issue (e.g., Horowitz (2014)), a decisive method has not been developed. More importantly, our approach would not work for a multivariate case because it uses the kernel density estimator, that is, our method suffers from the curse of dimensionality. Similarly, an approximation using the orthogonal basis functions as eq. (3) would be a problem because the number of parameters grows with (“approximation order" to the power “the number of dimensions of covariates") rate. Considering this issue, a nonparametric Bayesian approach may be helpful. We are planning to address this in future work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Hirano et al. (2001) Hirano, Keisuke, Guido W. Imbens, Geert Ridder, and Donald B. Rubin (2001) ‘Combining Panel Data Sets with Attrition and Refreshment Samples.’ Econometrica 69(6), 1645–1659
- 3Horowitz (2014) Horowitz, Joel L. (2014) ‘Adaptive nonparametric instrumental variables estimation: Empirical choice of the regularization parameter.’ Journal of Econometrics 180(2), 158–173
- 4Miao et al. (2016) Miao, Wang, Peng Ding, and Zhi Geng (2016) ‘Identifiability of Normal and Normal Mixture Models with Nonignorable Missing Data.’ Journal of the American Statistical Association 111(516), 1673–1683
- 5Nevo (2003) Nevo, Aviv (2003) ‘Using Weights to Adjust for Sample Selection When Auxiliary Information Is Available.’ Journal of Business & Economic Statistics 21(1), 43–52
- 6Newey and Powell (2003) Newey, Whitney K., and James L. Powell (2003) ‘Instrumental Variable Estimation of Nonparametric Models.’ Econometrica 71(5), 1565–1578
- 7Rubin (1974) Rubin, Donald B. (1974) ‘Estimating causal effects of treatments in randomized and nonrandomized studies.’ Journal of Educational Psychology 66(5), 688–701
- 8Takahata and Hoshino (2018) Takahata, Keisuke, and Takahiro Hoshino (2018) ‘Identification and Estimation of Heterogeneous Treatment Effects under Non-compliance or Non-ignorable assignment.’ ar Xiv:1808.03750 [stat]. ar Xiv: 1808.03750
