Exploring the heterogeneous impacts of Indonesia’s conditional cash transfer scheme (PKH) on maternal health care utilisation using instrumental causal forests
Vishalie Shah, Julia Hatamyar, Taufik Hidayat, Noemi Kreif

TL;DR
This study uses a new machine learning method to show how Indonesia's cash transfer program affects maternal health care differently depending on local resources and other factors.
Contribution
The novel use of instrumental causal forests reveals treatment effect heterogeneity in a conditional cash transfer scheme.
Findings
Mothers in areas with more healthcare workers benefited more from the program in terms of good assisted delivery.
The impact on post-natal visits in 2013 showed the largest heterogeneity among outcomes.
Household poverty and survey wave were significant factors in treatment effect differences.
Abstract
This paper uses instrumental causal forests, a novel machine learning method, to explore the treatment effect heterogeneity of Indonesia’s conditional cash transfer scheme on maternal health care utilisation. Using randomised programme assignment as an instrument for enrollment in the scheme, we estimate conditional local average treatment effects for four key outcomes: good assisted delivery, delivery in a health care facility, pre-natal visits, and post-natal visits. We find significant treatment effect heterogeneity by supply-side characteristics, even though supply-side readiness was taken into account during programme development. Mothers in areas with more doctors, nurses, and delivery assistants were more likely to benefit from the programme, in terms of increased rates of good assisted delivery outcome. We also find large differences in benefits according to indicators of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9- —http://dx.doi.org/10.13039/501100000272National Institute for Health and Care Research
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGlobal Maternal and Child Health · Poverty, Education, and Child Welfare · Advanced Causal Inference Techniques
Introduction
In recent decades, conditional cash transfer (CCT) programmes have become a popular policy tool in many low- and middle-income countries for alleviating short-term poverty via cash injections, while also improving the long-term trajectory of vulnerable families via investments in human capital (Parker and Todd 2017). Regular cash payments are made to households in exchange for compliance with certain behaviours, such as school attendance for children, or attendance at health check-ups for new mothers, among others. Numerous evaluations of CCTs, mainly based on randomised experiments (for example, PROGRESA in Mexico and PRAF in Honduras), have demonstrated the ability of these interventions to make substantial improvements on education, consumption and health outcomes, particularly in the short-term (Fiszbein and Schady 2009; Millán et al. 2019; Bastagli et al. 2019; Owusu-Addo et al. 2018; García and Saavedra 2017; Lagarde et al. 2007; Kabeer and Waddington 2015).
The majority of the CCT evaluation literature to date has focused on average effects, with fewer studies formally analysing whether effects differ for population subgroups defined by observable characteristics. Anti-poverty programmes are expected to impact households differently depending on their ability to convert the cash injections into desirable outcomes, which is highly dependent on their own attributes (Ravallion 2005; Cooper et al. 2020). For example, urban mothers may already have easier access to preventive health care facilities to satisfy the health requirements for pre- and post-natal check-ups, compared to those in rural regions. The cash injection could assist rural households in addressing some of the financial barriers in accessing health care, such as transport costs. Understanding this type of heterogeneity in programme impacts can help to inform better policy targeting that, among other objectives, identifies households that are expected to benefit the most, and protects those that are expected to benefit the least (Cooper et al. 2020). Of those studies that do in fact explore subgroup effects for pre-specified populations, many find evidence of heterogeneity that is consistent with the broader literature suggesting that CCT effectiveness on health outcomes is modified through various social determinants of health, such as education, wealth and the urban-rural distinction (Owusu-Addo et al. 2018; Bastagli et al. 2019).
In this paper, we contribute to the growing evidence base on the heterogeneous effects of CCT programmes by evaluating Indonesia’s Program Keluarga Harapan (Family Hope Programme, or PKH) using a unique data set from a large-scale randomised experiment that was implemented in 2007 alongside a baseline survey and two follow-up surveys in 2009 and 2013. We are interested in exploring how enrolment into PKH has influenced maternal health care utilisation in the short-term (2009) and the longer-term (2013) by performing separate analyses for both time periods. Existing evaluations of PKH have focused on estimating overall average effects, finding notable improvements in various utilisation outcomes, such as the probability of having a facility delivery, or that the delivery is assisted by trained professionals (Kusuma et al. 2016; Cahyadi et al. 2020). Few studies have acknowledged that PKH impacts may be heterogeneous, and preliminary subgroup analyses that stratify treatment effects by pre-selected “effect modifiers” (e.g. gender, employment sector, parental education levels), have shown this to be the case (Kusuma et al. 2016; Alatas 2011). However, traditional approaches to heterogeneous treatment effect estimation (e.g. estimating treatment effects on effect modifier strata, or performing interactions between the treatment variable with effect modifiers in a linear regression) have their own limitations, including potentially arbitrary subgroup analyses and issues of multiple hypotheses testing. Recent developments in machine learning (ML)-based estimators of treatment effect heterogeneity use flexible modelling strategies that can capture complex interactions between theoretically-motivated covariates without imposing restrictive functional form assumptions. Generalised random forests, developed by Athey et al. (2019), have become a popular tree-based ML tool for estimating causal effects, including the conditional average treatment effect (CATE) function, which captures heterogeneity in treatment effects by flexibly modelling interactions among pre-selected effect modifiers informed by theory and prior empirical evidence.1
We rely on the random assignment of PKH to inform our empirical strategy. While the programme was randomised, actual enrolment was not random, due to non-compliance and targeted assignment within randomised populations. Using random assignment as an instrument for treatment status is a well-established approach in the literature on experimental evaluations with non-compliance (Angrist et al. 1993, 2006), as it addresses selection bias because the randomisation protocol is exogenous to potential outcomes. Following Triyana (2016) and Cahyadi et al. (2020), we address these potential observed and unobserved differences between enrolled and not enrolled groups using an instrumental variable analysis, where we instrument PKH enrolment with the original randomisation mechanism itself. While these previous studies focus on estimating average programme effects, we extend their analysis by estimating and characterising heterogeneous effects—revealing how impacts differ across population subgroups and identifying key drivers of this variation. We use instrumental causal forests; a variant of generalised random forests that allows for the presence of unmeasured confounding if there is a valid instrument available (Athey et al. 2019). This method targets the estimation of the so-called conditional local average treatment effect, characterising how treatment effects vary according to observed characteristics of compliers, in our case mothers who complied with the randomisation protocol. We summarise treatment effect heterogeneity using three approaches: (1) we find the best linear predictors of heterogeneity; (2) we assess how the most and least affected population groups differ in terms of observable characteristics, and (3) we estimate optimal policy trees of selected depth and describe which characteristics are chosen as the most important decision criteria for treatment allocation (Chernozhukov et al. 2018b; Semenova and Chernozhukov 2021; Knaus et al. 2021; Kennedy 2020; Athey and Wager 2019).
This paper has three main contributions. First, we add to the growing collection of CCT evaluation studies that look beyond average impacts and capture heterogenous impacts according to observable differences in covariates. A novel contribution is our use of data-driven methods, in particular tree-based causal ML, to estimate and make inferences on the heterogeneous impacts of a CCT intervention. Unlike previous heterogeneity evidence which tends to consistently report greater effects mostly among the poorer population, we find larger increases in facility delivery and post-natal visits for better-off households. Our findings could help to support those from existing heterogeneity analyses by identifying potentially new population subgroups that have not been specified in advance, such as groups defined by combinations of supply-side readiness, household poverty indicators, or geographic and temporal factors. By using data-driven methods, we allow for the discovery of interaction effects and subgroup patterns that may be overlooked in traditional stratified analyses. Second, to our knowledge, this is the first paper to evaluate a large-scale policy intervention using instrumental forests. Several published papers have used causal forests2 without incorporating an instrumental variable analysis to address endogeneity concerns (Kreif et al. 2022; Bertrand et al. 2017; Davis and Heller 2017; O’Neill and Weeks 2018; Hoffman and Mast 2019; Athey and Wager 2019). Third, by applying these methods to Indonesia’s national CCT programme, we aim to generate insights that can inform future programme design and targeting -particularly in relation to supply-side readiness and household-level characteristics that may shape programme effectiveness.
The PKH programme
Background and design
PKH was launched by the Government of Indonesia in 2007 as the country’s first CCT programme targeted to households. It was designed in response to increasing concerns around the country’s consistently poor human development outcomes (i.e. high mortality rates for new mothers and children under-5 and low enrolment rates for primary and secondary schools) compared to neighbouring countries, despite experiencing sustained economic growth. Prior to the implementation of PKH, an unconditional cash transfer programme (Bantuan Langsung Tunai, or BLT) was trialed but failed to achieve the desired outcomes due to ineffective targeting of the poor and a lack of conditions on the transfers to incentivise poverty-reducing behaviours (World Bank 2012). In comparison, PKH provides quarterly cash transfers to extremely poor households with pregnant women and/or children, with the objective of improving lagging health and education outcomes (Alatas 2011). The cash payments, ranging between 600,000 and 2,200,000 rupiah per quarter (approximately 60–330 US dollars, depending on household composition) were made to women in the household, who were informed at the start of the programme that in order to continue receiving payments, they must fulfil certain obligations. For example, pregnant or lactating women are required to make four pre-natal care visits and two post-natal visits, take iron tablets during pregnancy, and have an assisted delivery with a trained professional.3 The average duration of household enrolment into PKH is between 2 and 4 years, in which time the programme aims to achieve improvements in welfare and human development indicators.
The first phase of the PKH experiment was introduced in six provinces (West Java, East Java, North Sulawesi, Gorontalo, East Nusa Tenggara, and Jakarta). The selection of these provinces was based on willingness to participate and diversity representation of poverty levels, urban–rural characteristics, and remoteness. The richest 20 per cent of districts within each province were excluded. Then, 736 subdistricts that met minimum supply-side readiness criteria, corresponding to a population of about 36 million people, were identified. From this eligible pool, 438 subdistricts were randomly assigned to the treatment group, with the remaining subdistricts constituting the control group. Approximately 700,000 extremely poor households within these treated subdistricts were then enrolled as PKH participants via proxy means testing.
The World Bank collected data via a baseline survey in the months prior to launch, and two follow-up surveys in 2009 and 2013. Out of the 736 sampled subdistricts, 360 were randomly chosen for data collection (corresponding to approximately 14,000 households), which included beneficiary and non-beneficiary households in 180 treated subdistricts, and eligible households in 180 control subdistricts. To establish the stratification, the PKH sample considered urban and rural classification. The sampling frame was created by randomly selecting eight villages within each subdistrict, and then selecting one rural ward (dusun) within each village, or one urban precinct (kelurahan) within cities. Four households were randomly selected within each dusun/kelurahan, in a way that ensured two households included a pregnant or lactating mother or a married woman who was pregnant within the last 2 years, and the other two included children aged 6–15. The same households participated in the follow-up surveys which also used the original baseline questionnaire and respondent lists. The expansion of the programme post-2007 did not affect the composition of the control group to a large extent since new subdistricts, outside of the original sample, were prioritised for treatment. However, the value of the cash transfer fell from 14% of monthly household consumption in 2007 to 7% by 2013.
Related literature
Existing evidence on the impacts of CCTs on health care utilisation is vast. Early impact evaluations of pioneering programmes implemented in Latin America and the Caribbean have generated substantial evidence on their effectiveness in increasing the utilisation of preventive health care services among the poor, and in some cases, improving health outcomes (Lagarde et al. 2007; Glassman et al. 2007; Ranganathan and Lagarde 2012). For example, there were substantial increases in pre-natal care visits of 8% and 19% in Mexico (Progresa) and Honduras (Programe de Asignacion Familiar, PRAF (Barber and Gertler 2009; Morris et al. 2004). Looking beyond average effects, Cooper et al. (2020) conducted a review into the existing literature reporting heterogeneity in programme impacts across population subgroups defined according to sex, socioeconomic status, region and education. Of the 56 reviewed studies, 40 reported subgroup effects presented either as stratum-specific effects or as interactions between effect modifiers and the intervention. Using evidence from India (Janani Suraksha Yojana, JSY) and Mexico (Oportunidades), they found that positive programme effects on health care utilisation were generally larger among women that are younger (aged 15–24), more disadvantaged, less educated, rurally-based, and from regions where the CCT scheme was more rigorously implemented.
In Indonesia, impact evaluations of PKH support these earlier findings that the cash incentives translate to greater health care demand. Cahyadi et al. (2020) find dramatic short- and longer-term effects of PKH on various behaviours (even after correcting for multiple hypothesis testing): an increase in the average number of post-natal visits (0.8) in 2009; and increases in the probability of having a facility delivery (17%) and a delivery assisted by a doctor or midwife (23%) in 2013. The authors, however, do not explore varying impacts across the population. Kusuma et al. (2016) similarly find encouraging effects on utilisation in 2009, including an increase in the proportion of women who had \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 4$$\end{document} pre-natal visits (4%), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge $$\end{document} 2 post-natal visits (5%), and a facility delivery (7%). They explore whether utilisation effects vary for pregnant women that are indicated as high-risk, finding that proportions of pre-natal visits and facility delivery decrease as risk increases. Alatas (2011) also finds substantial increases in the likelihood of beneficiary mothers completing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 4$$\end{document} pre-natal visits (13%) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 2$$\end{document} post-natal visits (21%). They additionally reported subgroup effects, finding that PKH effects on newborn-related health care utilisation are greater among urban and non-agriculturally based households where health care facilities are more accessible and available, and among female-led households. Finally, they report that mothers with some formal education are more likely to have a facility delivery and make post-natal visits, whereas those with no education are more likely to have an assisted delivery.
Data
We construct a dataset of married women aged 16–49 who had pregnancies or deliveries within the 2 years prior to the 2009 and 2013 follow-up surveys. For the outcomes, we construct four binary variables related to health care utilisation that indicate whether the woman attended at least four pre-natal check-ups, the delivery took place at a medical facility, the delivery was assisted by a trained professional, and the woman attended at least two post-natal check-ups. We decide to discretise the continuous outcomes for the number of pre- and post-natal visits since PKH requires a specified minimum number of visits to be met in order to make the cash transfer. Figure 1 displays the distribution of the number of health visits made by control and treated populations both pre- and post-experiment.Fig. 1. Distribution of pre- and post-natal health visits, by enrolment status and year. Note: The main analysis in this paper covers survey waves from 2009 and 2013, but here we present pre-experiment outcomes from 2007 as a basis of comparison
We remove observations with incomplete data on the outcomes and the vector of covariates. This exclusion of observations could potentially introduce bias if the non-response to surveys is not random. However, as shown in Table 3, the characteristics of the randomly assigned treatment and control groups in our final analytical sample remain well-balanced, with all standardised mean differences (SMDs) below the 0.1 threshold. This suggests that sample attrition did not create systematic differences between the treatment and control groups. The final sample sizes for the main analysis are 2065 for 2009 and 1989 for 2013.
Table 1 presents the proportion of observations that comply with the randomisation protocol. In 2009, around half of those randomised to be in the programme were actually enrolled, with a non-negligible share of households—10% in 2009 14% by 2013—in control subdistricts enrolled in PKH. The presence of such ‘always-takers’ reflects non-compliance with the original randomisation protocol and the broader expansion of the programme beyond the initial experimental design. As PKH scaled nationally, some control areas received treatment due to administrative adjustments or targeting refinements.
Using the principal strata framework (Angrist and Imbens 1995; Angrist et al. 1996; Angrist 2004), we estimate that compliers made up 39.8% of the sample in 2009 and 34.4% in 2013. Always-takers accounted for 9.6% (2009) and 13.7% (2013), while never-takers accounted for 50.6% (2009) and 51.9% (2013).4 These shares indicate that compliers form a sizeable fraction of the population, suggesting that the LATE is informative for policy because it pertains to the marginal group induced into enrolment by program assignment (Angrist 2004).Table 1. Contingency table showing the association between the randomisation protocol and enrolment status2009(N = 2065)Randomisation (Z)ControlTreatedEnrolment (D)Not enrolled929 (90%)525 (51%)Enrolled99 (10%)512 (49%)2013(N = 1989)Randomisation (Z)ControlTreatedEnrolment (D)Not enrolled884 (86%)501 (52%)Enrolled140 (14%)464 (48%)Randomisation protocol (Z) indicates whether the mother lives in a treatment or control subdistrict. Enrolment status indicates whether the mother actually received PKH (“enrolled”) or not (“not enrolled”)
Methods
Estimation of treatment effects
We are interested in separately estimating the causal effects of being enrolled into the PKH programme (compared to not being enrolled) in 2009 and 2013 on various outcomes relating to maternal health care utilisation—the number of pre-natal visits, the number of post-natal visits, the probability of an assisted delivery by a skilled midwife or doctor, and the probability of delivery at a health facility. We perform these analyses separately, using a common notation Y for all outcomes, and D for the binary indicator of PKH enrolment. Although we use the general notation Y for outcomes, in our main analysis Y refers to binary indicators for each outcome, as described in Sect. 2.3.
Under the potential outcomes framework for causal inference, we denote the potential outcome that would be observed if individual i was enrolled into programme d by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i(d)$$\end{document} . For motivation, we begin by defining two standard estimands: (1) the average treatment effect (ATE), which takes the expectation of the individual treatment effects across the population, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau =E[Y_i(1)-Y_i(0)]$$\end{document} ; and (2) the conditional average treatment effect (CATE), which evaluates the ATE for individuals with the same covariate profile \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i=x$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)=E[Y_i(1)-Y_i(0)|X_i=x]$$\end{document} . However, due to non-compliance in programme enrolment, the treatment \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} is endogenous. We therefore adopt an instrumental variable (IV) approach, where the binary indicator \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} -denoting whether the household is located in an initial PKH subdistrict–serves as an instrument for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} . This setup identifies the Local Average Treatment Effect (LATE) and its conditional analogue, the Conditional Local Average Treatment Effect (CLATE), which represent the causal effect of treatment for compliers, conditional on covariates.
Following Athey et al. (2019), we define the relationship between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} using a structural model, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i=m(X_i)+\tau (X_i)D_i+\varepsilon _i$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m(X_i)$$\end{document} is a nuisance function whose shape is unspecified, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon _i$$\end{document} is an error term. This model generalises the standard linear regression approach traditionally used for causal inference by allowing for flexible, data-driven estimation of both baseline outcome variation and heterogeneous treatment effects. The main distributional assumption is that the error term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon _i$$\end{document} is mean-independent of the treatment assignment conditional on covariates and instrument. Since PKH was targeted to households (and not randomly assigned) and there was some reported non-compliance, we cannot proceed with the assumption that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon _i$$\end{document} is independent of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} . This means that a regression of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} will not yield a consistent estimate of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)$$\end{document} . We, therefore, introduce an instrumental variable (IV) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} , which is a binary indicator for whether the household is located in an initial PKH subdistrict, and represents the study randomisation mechanism. If \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} has a causal effect on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} conditionally on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i=x$$\end{document} (the “relevance” assumption), and affects \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} only through \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} conditionally on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} (the “exclusion restriction”), then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)$$\end{document} can be identified as follows:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tau (x) = \frac{\textrm{Cov}[Y,Z|X_i=x]}{\textrm{Cov}[D,Z|X_i=x]}, \end{aligned}$$\end{document}where the numerator is the average intention-to-treat effect for a subgroup defined by the covariate profile x, interpreted as the conditional effect of being given the opportunity to enrol into PKH, and the denominator is the share of compliers in the subgroup with covariate profile x. We can use heterogeneous treatment effect estimation methods that use the identification in (1) to estimate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)$$\end{document} as the conditional local average treatment effect (CLATE), by solving an estimation equation of the form:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E[\psi _{\tau (x),m(x)}(Y_i,D_i,Z_i)|X_i=x]=0 \;\text {for all} \; x\in \mathcal {X}, \end{aligned}$$\end{document}where,
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \psi _{\tau (x),m(x)}= \left( \begin{array}{lr} Z_i(Y_i-D_i\tau (x)-m(x))\\ Y_i-D_i\tau (x)-m(x)\\ \end{array}\right) . \end{aligned}$$\end{document}Equality (2) encodes the identification assumptions required for consistent estimation: the first row in (3) corresponds to the condition that the instrument is uncorrelated with the error term, and the second row corrects to the condition that the error term has mean zero.
In the generalised random forest framework, an instrumental forest is defined to solve a weighted sample analogue of (2), where each observation i is assigned a weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _i(x)$$\end{document} that reflects its similarity to the target covariate profile x:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{i=1}^N \alpha _i(x) , \psi _{\tau (x),m(x)}(Y_i,D_i,Z_i) = 0, \end{aligned}$$\end{document}where N denotes the sample size. The weights \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _i(x)$$\end{document} are determined by the structure of the instrumental causal forests—described below—and represent the influence of each observation on the estimation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)$$\end{document} at x. This weighting scheme defines local neighbourhoods where treatment effects are similar, allowing for flexibly estimating heterogeneity in treatment effects.
In practice, the instrumental causal forest is constructed using local two-stage least squares (2SLS) estimation. Compared to conventional 2SLS, this method accommodates complex interactions and nonlinearities in covariates. The approach also allows for the relationship between the instrument Z and treatment uptake D to be heterogenous across individuals, because the instrumental variable estimation is conducted locally among units with similar covariates. Consequently, the resulting CLATEs capture heterogeneity in both treatment effects and compliance behaviour, remaining local to the relevant complier subpopulations.5
Specifically, the local 2SLS regressions are performed within a small neighbourhood of observations—a leaf—using the residualised versions of the outcomes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - m(X_i)$$\end{document} , the treatment assignment \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i - e(X_i)$$\end{document} (where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e(x) = P[D|X_i]$$\end{document} is the treatment propensity), and the instrument \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i - g(X_i)$$\end{document} (where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(x) = P[Z|X_i]$$\end{document} is the instrument propensity), treatment effect estimates obtained by solving the local versions of the moment equations defined above.67 So-called instrumental (causal) trees (Athey et al. 2019) are formed by recursively partitioning the data into leaves in a way that maximises the within-leaf heterogeneity in treatment effects. The trees are constructed using a sample splitting technique referred to as “honesty”, to avoid overfitting (Athey et al. 2019). This procedure is repeated across many bootstrapped samples to limit noise arising from individual trees, with the tree ensemble representing the instrumental forest.
Individual treatment effects \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(X_i)$$\end{document} are estimated by evaluating \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(x)$$\end{document} at each covariate profile \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} . The treatment effects \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(X_i)$$\end{document} can also be aggregated over the entire population to provide an estimate of the local average treatment effect (LATE), by plugging in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(X_i)$$\end{document} into a variant of the augmented inverse probability of treatment weighted estimator (also known as the doubly robust estimator), formed by taking the average of so-called doubly robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} :
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\tau } = \frac{1}{N}\sum ^N_{i=1}\hat{\Gamma }_i,\quad \hat{\Gamma }_i = \hat{\tau }(X_i) + \frac{\left( \frac{Z_i-\hat{g}(X_i)}{\hat{g}(X_i)(1-\hat{g}(X_i))}\right) }{\delta (X_i)}(Y_i-\hat{m}(X_i)-(D_i-\hat{e}(X_i)\hat{\tau }(X_i)). \end{aligned}$$\end{document}Construction of this particular doubly robust score requires estimates of m(x), e(x) and g(x), which are separately estimated via regression forests. It also requires a so-called compliance score \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (X)=E[D|X,Z=1]-E[D|X,Z=0]$$\end{document} , which is an estimate of the causal effect of Z on D that is estimated via an auxiliary causal forest.8
Inference on treatment effect heterogeneity
Once we have obtained estimates of the individual CLATEs and double robust scores for each individual, we can use these estimates to examine drivers of treatment effect heterogeneity in a data-driven way. One way to assess treatment effect heterogeneity is to perform a linear regression of the doubly robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} on X to compare the relative contribution of covariates in predicting the CLATEs (Semenova and Chernozhukov 2021; Chernozhukov et al. 2018a). The resulting coefficients from the linear model are referred to as the best linear predictors (BLP) of CLATEs. These coefficients represent partial correlations conditional on a linear index of the other variables and should not be interpreted as causal effects. If a coefficient of the BLP for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} is positive and significant, we interpret as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} having a significant positive linear impact on the treatment effect heterogeneity, holding all other variables constant.9
Another way that Chernozhukov et al. (2018) suggest assessing treatment effect heterogeneity is through use of a “Classification Analysis” (CLANs). This involves partitioning data into quartiles according to the estimated double-robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} , in effect ranking the observations from low to high estimated treatment effects. For each effect modifier of interest, we regress the variable on the indicator of being in the most affected group, using ordinary least squares (OLS). This analysis is then repeated for the indicator of being in the least affected group. For each effect modifier, we test whether the difference between the two estimated coefficients is statistically significant. If the difference is significantly positive, then those individuals with characteristic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} experienced greater levels of the treatment effect. In contrast to the BLP analysis, this can be interpreted as a univariate analysis—as effect modifiers are analysed one by one, without controlling for the others—and can provide further evidence of treatment effect heterogeneity with respect to specific covariates expressed as binary indicators.
After examining the BLP and CLANs, we compute a variable importance measure. Variable importance (VI) in the context of causal forests quantifies the relative contribution of each covariate to the heterogeneity in treatment effects. This approach assesses how frequently a variable is used to split the data during the learning of the causal forest’s decision trees. Specifically, it is derived from the frequency with which each covariate is selected for splitting, weighted by the reduction in variance of the estimated conditional average treatment effects (CATEs) that these splits induce. This highlights covariates that are most influential in explaining heterogeneity in treatment responses during the model estimation process itself. We interpret these results with caution: there is emerging evidence that VI metrics obtained from causal forests and other causal machine learning algorithms can be biased if one of the (observed) confounding variables is also strongly involved in the treatment effect heterogeneity (Hines et al. 2022; Bénard and Josse 2023).10 However, as a qualitative approach, we can use the VI output as a comparison with the previous estimate-based BLP and CLAN results. We rank the top 10 variables according to their VI measure.
Finally, we turn to a method commonly used to learn optimal treatment allocation rules in an interpretable way: policy trees. The policy tree learning algorithm described by Athey and Wager (2021) performs exhaustive search over all possible trees using the estimated \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} , choosing as the final treatment rule the tree which maximises the overall treatment effect (Athey and Wager 2021). In other words, we estimate the optimal policy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }(X_i)$$\end{document} which maximises a value function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{A_n}(\pi )$$\end{document} :
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\pi }_n = \mathop {\textrm{argmax}}\left\{ \hat{A}_n(\pi ) : \pi \in \Pi _n\right\} ,\quad \hat{A}_n = \frac{1}{N}\sum ^N_{i=1}(2\pi (X_i)-1)\hat{\Gamma }_i, \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Pi _n$$\end{document} denotes the class of binary decision rules for sample size N. We are interested in whether those characteristics that show significant relationship to treatment effect heterogeneity using BLPs and CLANs are also most commonly used by the policy tree algorithm to assign treatment under the optimal policy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }$$\end{document} . We report which variables are chosen as the most important decision criteria when assigning the optimal treatment regime. We consider depth-two and depth-three policy trees to ensure interpretability.11Table 2List of selected variables in XVariableUsed in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{m}(x)$$\end{document} Enrolled into subsidised insuranceYesEnrolled into other (non-subsidised) insuranceYesLives in JavaYesLives in an urban areaYesAge (16–29; 30–39; 40–49)YesMother is educated (at elementary level)YesMother is employedYesHead of household is educated (at elementary level)YesHead of household is employed in agriculture sectorYesHead of household is employed in service sectorYesHousehold spends above average on alcohol and tobaccoYesNumber of practising doctors in village per capita (q1–q3)NoNumber of practising nurses in village per capita (q1–q3)NoNumber of practising midwives in village per capita (q1–q3)NoNumber of practising traditional birthing attendants in village per capita (q1–q3)NoLn(size of household) (q1–q3)YesLog(household non-food expenditure per capita) (q1–q3)YesHousehold has no clean waterYesHousehold has no own latrineYesHousehold has no septic tankYesHousehold has no electricityYesVillage chief indicates “lack of healthcare facilities" a top 3 concernNoVillage chief indicates “lack of medical equipment" a top 3 concernNoVillage chief indicates “low healthcare awareness" a top 3 concernNoContinuous variables have been discretised into terciles (q1–q3) to ensure overlap and create policy-relevant thresholds
The analytical steps taken in this paper are described as follows:
- Train an instrumental forest with 2000 trees in the ensemble, tuning key hyperparameters (mtry, sample.fraction, honesty.fraction, and alpha) using cross-validation12 while maintaining standard defaults for other parameters:
- Nuisance parameters—m(x), e(x) and g(x)—are estimated using separate regression forests, where the propensity score e(x) is estimated without including supply side variables in X, since these variables are used to determine eligibility and sample construction (see Sect. 2.3).
- The entire covariate vector X is used for the recursive partitioning—see Table 2 for a list of covariates.
- Predict \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(X_i)$$\end{document} by evaluating the trained instrumental forest for each observation’s covariate profile:
- Predictions are made “out-of-bag”, meaning that only the trees that did not use observation i during the training process are used in the prediction.13
- Construct doubly robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\Gamma }_i$$\end{document} to compute the LATE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} .
- Assess the treatment effect heterogeneity captured by the forest outputs:
- Plot a histogram of the estimated CLATEs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(X_i)$$\end{document} .
- Perform a linear regression of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\Gamma }_i$$\end{document} on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} to find the best linear predictors of the CLATEs, and plot the estimated regression coefficients.
- Test the difference between the most and least effected individuals for each variable (CLANs), and plot the estimated differences, with confidence intervals.
- Obtain the variable importance metric (VI) and rank the top ten most important variables used in the process of learning of the instrumental forest.
- Learn and plot a depth-two and depth-three policy tree to examine which covariates are chosen as most important splitting criteria. We implement these steps for each outcome and year separately.Table 3. Summary statistics2009 (N =2065)2013 (N = 1989)(N = 2065)(N = 1989)RandomisationEnrolmentRandomisationEnrolmentControlTreatedSMDNot enrolledEnrolledSMDControlTreatedSMDNot enrolledEnrolledSMDOutcomesPre-natal visits \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 4$$\end{document} 0.6440.7000.0560.6770.661−0.0160.7710.8380.0670.8050.801−0.004Good assisted delivery0.4600.5170.0570.4940.475−0.0200.7210.7680.0470.7550.719−0.036Facility delivery0.7780.8460.0670.8030.8350.0320.8170.8380.0210.8380.805−0.033Post-natal visits \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 2$$\end{document} 0.2860.3660.0800.3090.3680.0590.4120.4410.0290.4360.404−0.032Mother’s characteristicsAge:16–290.4080.4670.0590.4570.390−0.0680.5370.5780.0410.5700.528−0.042Age:30–390.4880.436−0.0520.4520.4860.0340.3680.333−0.0360.3510.3510.000Age:40–490.1040.097−0.0070.0910.1240.0340.0950.089−0.0060.0790.1210.041Educated (elementary)0.8020.8280.0270.8180.809−0.0090.8610.8760.0140.8790.843−0.037Employed0.2150.200−0.0150.1990.2260.0260.2370.187−0.0510.2090.2200.011Subsidized insurance0.7290.7470.0190.7100.8040.0930.7380.7590.0200.7160.8210.105Other insurance0.0460.0540.0080.0530.043−0.0100.0500.0620.0120.0590.048−0.011Head of household’s characteristicsEducated(elementary)0.6960.7280.0330.7100.7150.0050.7750.7790.0040.7910.747−0.044Works in agriculture0.5910.6280.0360.5850.6680.0820.5340.508−0.0260.5100.5480.038Works in service sector0.1640.157−0.0070.1620.159−0.0030.1110.089−0.0220.1080.083−0.026Household characteristicsLives in Java0.6110.6210.0100.6620.506−0.1570.6810.656−0.0250.7250.540−0.185Urban location0.1290.115−0.0150.1170.1340.0170.1200.119−0.0010.1150.1310.016Num. HH members (ln)1.8131.793−0.0611.7961.8210.0751.8171.8180.0031.8041.8500.132Non-food exp (PC)10.91810.9330.02410.96510.832−0.20411.52111.5690.07211.58111.459−0.180Alcohol/tobac exp (PC)7.6387.8710.0627.7417.7900.0138.3868.7050.0828.3878.8920.135No clean water0.8690.8910.0220.8810.877−0.0040.8920.875−0.0170.8860.877−0.008No latrine0.4830.5050.0220.4600.5760.1160.4560.452−0.0040.4180.5360.118No septic tank0.6550.6610.0060.6390.7020.0630.5820.5900.0080.5650.6340.069No electricity0.2260.202−0.0240.1840.2830.0990.1120.1150.0030.0900.1670.077**Health care supply (number of practising workers in village per capita*1000)Doctors0.3210.301−0.0250.3010.3340.0430.2660.259−0.0110.2390.3180.123Nurses0.5910.585−0.0050.5640.6440.0580.5160.5190.0020.4840.5930.106Midwives0.5230.518−0.0070.5070.5530.0680.4960.5100.0250.4940.5240.053Trad. birth attendants1.0301.0610.0190.9351.3090.2090.9651.0340.0440.8901.2470.217Village characteristics (village head identified as top 3 concern)**Lack healthcare facilities0.3120.299−0.0130.3090.298−0.0110.2710.2930.0220.2660.3180.051Lack medical equipment0.1700.2110.0410.1750.2290.0540.1590.1930.0340.1700.1890.019Low health awareness0.1320.095−0.0370.1150.111−0.0040.1100.101−0.0100.1030.1130.010All columns apart from SMD report covariate means. SMD = standardised mean difference. q1 = highest quantity. Observations with complete data included only
Results
Table 3 presents summary statistics for our sample populations in 2009 and 2013. We report covariate means according by randomisation status Z (we refer to these as treated and control samples), defined as those living in subdistricts that are assigned versus not assigned to PKH. We also report covariate means for sample populations based on their actual enrolment status in the PKH, D (referring to these as enrolled and not enrolled samples). Overall, the table highlights that the target population is largely rural-based (approximately 90%) with household heads who work in agriculture, and that the majority (50–60%) live on the island of Java and lack household utilities such as running water.
As expected, the average characteristics for treated and control samples are similar given the random assignment of PKH to subdistricts, with none of the reported SMDs being greater than 0.1. When comparing enrolled and not enrolled mothers, we find some large differences, highlighting the socioeconomic disadvantages faced by enrolled mothers. For example, in both time periods, enrolled mothers are less likely to live in Java, have larger households, spend more on non-food items, and have a greater supply of traditional birth attendants in the village, compared to non-enrolled mothers. Between 2009 and 2013, we can observe an increase in non-compliance (only 77% of mothers living in subdistricts that are assigned to PKH were actually enrolled in 2013, compared to 84% in 2009 (see Table 1), leading to a further increase in the imbalance between enrolled and not enrolled mothers. This increased imbalance is most notable for supply-side variables, somewhat reducing the relative disadvantage of the enrolled group due to improved access to healthcare providers: we find that enrolled women live in villages that are more likely to have a greater supply of doctors and nurses per capita compared to non-enrolled women.
This descriptive information also allows us to contrast the characteristics of the compliers to the randomised population. The compliers—those who actually enrol into PKH—are more likely to be older (aged 30–49) and have subsidised health insurance. They are also more likely to be urban-based but less likely to live in Java, and they tend to have slightly larger households compared to the randomised population.
Local average treatment effects
Figure 2 displays histograms of the estimated CLATEs ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(x)$$\end{document} from the tuned instrumental forest for each outcome and year).14 Looking firstly at local average effects, the programme has positive significant short- and longer-term impacts on the probability that the mother has a good assisted delivery (LATE = 0.14 (SE = 0.07) in 2009, LATE = 0.17 (SE = 0.07) in 2013). The beneficial average impacts on the probability that the mother meets the required threshold for pre- and post-natal visits are only significant in 2009 (pre-natal LATE = 0.17 (SE = 0.06), post-natal LATE = 0.21 (SE = 0.07)), suggesting that the programme has more immediate rather than sustained effects on health care visits.15 Lastly, we find no effect on the probability that the mother has a facility delivery in both time periods. To further validate our findings, we present numeric values for LATE estimates obtained from both the causal forest and a traditional two-stage least squares (2SLS) in Appendix Table 5. Estimates are largely consistent across the two models, with the exception that the 2SLS results are statistically significant for all outcomes except the pre-natal and post-natal visits in 2013.Fig. 2. Histograms of estimated CLATEs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }(x)$$\end{document} from the instrumental forest, by outcome and year. Note: Dashed lines denote the ATE point estimates and 95% confidence intervals (via the Augmented Inverse Probability of Treatment Weighted (AIPTW) estimator). Red solid line at zero
Looking beyond average effects, the histogram in Fig. 2 provides evidence of treatment effect heterogeneity, but the extent and pattern of heterogeneity varies by outcome and year. For some outcomes, such as post-natal visits in 2013, the CLATEs span a wide range (from approximately −0.2 to 0.4), indicating substantial variation in individual treatment effects and suggesting that some compliers are less likely to attend post-natal check-ups after receiving the cash transfer. In contrast, for outcomes like good assisted delivery, the distributions of CLATEs are narrower, indicating less heterogeneity and fewer compliers displaying adverse behaviour in response to the programme. Not all outcomes show CLATE distributions that are symmetric or span both negative and positive values. For example, the distributions for good assisted delivery, facility delivery, and pre-natal visits are more concentrated and may not extend far into negative values, reflecting more consistent positive programme impacts for most compliers.
Best linear predictors of treatment effects
Figures 3 and 8 present the estimated coefficients from the best linear predictor analysis, which linearly regresses the doubly robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\Gamma }_i$$\end{document} on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} for both binary and continuous outcomes. For the good assisted delivery outcome, the analysis shows that in 2009, having the head of household employed in the agriculture sector is associated with lower treatment effects, suggesting that the programme was less effective for these households. For agricultural households, this reduced effectiveness may reflect opportunity costs or different healthcare preferences in rural areas. In 2013, however, households lacking a septic tank or electricity, as well as those in villages where the chief identified “lack of healthcare facilities” as a primary concern, experienced higher treatment effects. This pattern indicates that PKH was particularly beneficial for the poorest households and those with limited access to health facilities.Fig. 3. Estimated coefficients (and 95% CIs) from the best linear predictor analysis of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} . Note: Instrumental forest estimate of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau (x)$$\end{document} using the instrument \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} as treatment \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} . HoH = head-of-household. HH = household. PCE = per capita expenditure. Tradbirth = traditional birth attendant. Continuous variables have been converted to discrete variables using terciles. Reference categories include: Age 40–49, Doctor:q3 (third tercile), Nurse:q3, Midwife:q3, Tradbirth:q3, HHsize:q3 and PCE:q3. q1 = largest quantity. Confidence intervals are obtained from the regression standard errors
Turning to facility delivery, the results for 2009 highlight that living in an urban area (but not Java) and village chief concern about “low health awareness” are strong positive predictors of increased treatment effects. In 2013, the only negative predictor identified is the lack of a household latrine, while lack of electricity remains a positive predictor. These findings suggest that the programme may have encouraged mothers who lacked appropriate provisions for home birth to seek facility-based deliveries. The temporal differences in facility delivery effects between 2009 and 2013 may reflect programme maturation effects and changing healthcare infrastructure over time.
For pre-natal visits, the 2009 results show that not having a septic tank and living in villages with the highest supply of nurses per capita are both associated with lower treatment effects. This may reflect that mothers already living in areas with better health worker supply did not change their health care demand in response to the programme. In 2013, the absence of household electricity is a positive predictor, while village chief concern about “lack of healthcare facilities” and enrollment in non-subsidised health insurance are negatively associated with treatment effects. These results point to the importance of travel time and facility access as barriers to programme effectiveness.
Finally, for post-natal visits, the analysis finds that being in the second highest tercile of midwives is a negative predictor in 2009. In 2013, being in the second tercile of nurses is associated with lower treatment effects, whereas households lacking clean water and heads of households working in agriculture are associated with higher treatment effects. This suggests that PKH was more effective for households with fewer amenities and less access to health personnel.
Notably, household per capita expenditure terciles show consistently small and mostly non-significant coefficients across outcomes and years, suggesting that within this targeted poor population, baseline wealth does not systematically predict treatment responsiveness. This indicates that expanding program eligibility to less poor households would be unlikely to yield systematically different returns. The patterns for the continuous outcomes are broadly similar, with lack of amenities and supply-side variables showing significant associations with treatment effect heterogeneity. Overall, these results reinforce the importance of both household-level deprivation and local health system readiness in shaping the impacts of the PKH programme.
Classification analysis
Figure 4 presents the results of the classification analysis (CLAN), which examines the univariate relationship between group membership and treatment effect for each effect modifier. In this analysis, a positive coefficient indicates that individuals in the group (for example, urban residents) experience higher treatment effects compared to those not in the group (such as rural residents), while a negative coefficient indicates lower treatment effects for the group relative to others.Fig. 4. Mean differences (and 95% CIs) from the classification analysis (CLAN) of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} .Note: Effect modifiers are regressed on indicators of being in the high or low treatment effect groups. HoH = head-of-household. HH = household. PCE = per capita expenditure. Tradbirth = traditional birth attendant. Continuous variables have been converted to discrete variables using terciles. Note this is a univariate analysis, and thus there are no reference categories. Confidence intervals are obtained from the regression standard errors
For the assisted delivery outcome, the results show that urban residents had a higher treatment effect than rural residents in 2009, but this difference was not observed in 2013. Those without a septic tank experience higher levels of the treatment effect in 2009. In both years, the CLAN suggests that if the household has health insurance, but not public (Askeskin) insurance, there may be a lower treatment effect. For all of the supply-side variables (doctors, nurses, and midwives per capita), we see that those living with the lowest tercile had the highest treatment effects in 2013. For the facility delivery outcome, the results show that households without electricity had a higher treatment effect in 2009, but not 2013. The supply-side variables are less significant for this outcome. The households residing in Java were associated with higher levels of the treatment effect in 2013, and low village medical awareness was associated with lowest levels of the treatment effect for facility delivery in 2013.Table 4. Top 10 Variable Importance for 2009 and 2013Variable Ranking 2009Variable Ranking 2013Good assisted DeliveryPer capita expenditure: q20.05Midwives: q30.0348Age:30–390.0441Lack of healthcare facilities0.0336No latrine0.0414No latrine0.0332Age:16–290.0399Age:16–290.0331Midwives: q10.0355Traditional birth attendants: q10.0324Midwives: q30.0351Nurses: q10.0323Number of members (ln): q30.0347Doctors: q10.0311Doctors: q10.0343Per capita expenditure: q10.0302Nurses: q30.0342Number of members (ln): q30.03HH head educated(elementary)0.0338No septic tank0.0299Facility DeliveryHH head educated(elementary)0.0319No latrine0.0468Has Askesin insurance0.0315Employed in agriculture sector0.0446Doctors: q10.031Number of members (ln): q20.0439No latrine0.0302Age:16–290.0414Employed0.03Nurses: q20.041Per capita expenditure: q30.0295Nurses: q10.0402Employed in agriculture sector0.0294Midwives: q10.0396Number of members (ln): q30.0293No septic tank0.0392Midwives: q20.0292Age:30–390.0392Age:30–390.029Number of members (ln): q30.0388Pre-natal visitsNo electricity0.0883Employed in agriculture sector0.0469Nurses: q10.0852No latrine0.0447Lack of healthcare facilities0.0429Number of members (ln): q20.0446Per capita expenditure: q30.0396Nurses: q20.0405Nurses: q30.039Age:16–290.0402Employed0.0373Number of members (ln): q10.0398Nurses: q20.0351Age:30–390.0393Midwives: q30.0306Midwives: q10.0388Mother educated (elementary)0.0298Nurses: q10.0386Post-natal visitsLives in Java0.0295Number of members (ln): q30.0365No septic tank0.0511Nurses: q30.0468No latrine0.05Employed in agriculture sector0.0413Per capita expenditure: q20.0486No clean water0.0403Age:16–290.0407Doctors: q30.0374Nurses: q10.0402Doctors: q20.0347Number of members (ln): q10.0401Nurses: q20.0335Midwives: q10.0397Number of members (ln): q10.0328Age:30–390.0383Number of members (ln): q30.0327Employed in agriculture sector0.0378Midwives: q20.0325Doctors: q10.0377Age:30–390.0316Numeric values obtained from the GRF variable_importance function for causal forests
For the pre-natal visits outcome, we see major differences in the CLAN results between survey waves. In 2013, for example, many of the supply variables (highest terciles of nurses and midwives, lack of facilities and medical equipment) are associated with low levels of the treatment effect, in contrast to what we observe for 2009. We also observe that living in Java is associated with higher levels of heterogeneous treatment effects for pre-natal visits in 2013, suggesting that long-term incentives for these visits may be more effective in regions with adequate transportation infrastructure. For the post-natal visits outcome, we see that households without a septic tank have lower levels of the treatment effect, as do those employed in the agriculture sector in 2009. However, those in the agriculture sector experience significantly higher levels of the treatment effect in 2013, as do those living in the lowest terciles of nurses and doctors (but highest tercile of midwives). These results could be explained by differences in the type of personnel most likely to be present for a pre- or post-natal visit (doctor or nurse vs midwive) according to geographic characteristics. Across both analysis methods, we observe no clear socioeconomic gradient in treatment effects. Variables reflecting household wealth and socioeconomic status show mixed and generally non-significant patterns, contrasting with the consistent importance of healthcare supply variables.
Variable importance
Next, we assess the variable importance results presented in Table 4. These results are largely consistent with the findings of the BLP and CLAN analyses, though we interpret them with caution. Note that the actual numeric value obtained from the variable importance metric is meaningless, and only the relative ranking of this metric is used in qualitative analysis. For the assisted delivery outcome in both years, the supply-side variables are four of the top 10 importance measures. In 2013, for example, the lowest tercile of midwives was ranked as the top most important variable. In both years, lack of a household latrine is the third most important variable. For facility delivery, the lack of a latrine remains important, and is the most important variable in 2013, followed by employment in the agriculture sector. For the pre-natal visits outcome, lack of household electricity is the most important variable in 2009, and employment in the agriculture is the most important in 2013. These variables are associated with living in urban areas and lack of healthcare facilities, which were identified as important in the previous BLP and CLAN results. The highest number of nurses per-capita is ranked as the second most important variable for the pre-natal visits outcome, similar to the BLP results. For post-natal visits, again, living in Java and lack of septic tank or latrine were important in 2009, whilst having a lower number of family members, lowest tercile of nurses and being employed in the agriculture sector were the most important variables in 2013 (See Fig. 5).Fig. 5. Depth-two policy trees learned from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _i$$\end{document} . Note: The top row presents trees for 2009, and the bottom row for 2013. Each column depicts one of the four outcomes. w_educ:“Mother educated”,w_work:“Mother employed”, hh_head_agr:“HoH agriculture sector”,hh_head_serv:“HoH service sector”, d_java:“Lives in Java”,d_urban:“Urban location”, no_clean_water:“No clean water”, no_latrine:“No latrine”, no_septic_tank:“No septic tank”, no_electricity:“No electricity”, alctobacco:“Above avg alcohol/tobacco exp”, pce_1:“HH PCE q1”,pce_2:‘HH PCE q2”, pce_3:“HH PCE q3”, Supply: doctor_1:“doctor q1”, doctor_2:“doctor q2”, doctor_3:“doctor q3”, nurse_1:“nurse q1”, nurse_2:“nurse q2”, nurse_3:“nurse q3”, midwife_1:“midwife q1”, midwife_2:“midwife q2”, midwife_3:“midwife q3”, tradbirth_1:“tradbirth q1”, tradbirth_2:“tradbirth q2”, tradbirth_3:“tradbirth q3”, tradbirth_4:“tradbirth q4”, low_hlth_aware_top3_bl:“Low health awareness”
Overall, household characteristics associated with poverty (no electricity, latrine, or septic tank), as well as the supply-side readiness variables, are often listed as highly important across all years and outcomes. We note that household education, per-capita expenditure, insurance status, number of household members, and maternal age are also often included in the top ten important variable ranking lists, though they did not appear in the earlier analyses. These are likely variables that are highly correlated with the other household aspects, and although they are used in the learning and splitting of the causal forest trees, and therefore listed as important here, they are not picked up in the BLP or CLAN results, which rely on information from the individual treatment effect estimates themselves.
Policy trees
We first examine the depth-2 decision trees learned from the estimated double robust scores \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\Gamma }_i$$\end{document} . The trees are depicted in Fig. 9, with each row representing a survey year and each column representing one of the four outcomes under analysis. Each node of the decision trees presents a condition upon which the next choice of node relies. For example, in the 2009 assisted delivery tree, the first node condition is that the supply of doctors is not in the second tercile ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\le 0$$\end{document} ). If this condition is true, then the following node condition is that the household is not in the second consumption expenditure tercile. If that condition is true, then the algorithm assigns treatment, and if not, treatment is not assigned. In all eight decision trees, there is an importance of healthcare worker supply in terms of decision criteria. For 2013, the top decision node in three of the four outcomes is related to health worker supply. For the good assisted delivery outcome, for example, the first decision criteria is whether there are a high number of traditional birth attendants, and the right bottom node depends on whether there is a high supply of doctors.
There are major differences in learned trees between the two survey waves. In 2009, the most important decision criteria for all four outcomes included mother’s education and health insurance status, the household per-capita expenditure, size, urban location, and lack of septic tank, as well as supply of all types of healthcare personnel. In 2013, there is slightly less influence of the mother and household characteristics on the decision rules, with the nodes being for maternal employment, lack of latrine, electricity or clean water, and household head education and alcohol/tobacco expenditure. All four health worker supply variables were important splitting criteria for the optimal treatment allocation.
Finally, as a lower-depth decision tree will pick up more granular treatment assignment rules, we also examine depth-3 policy trees. Here we focus on the depth-3 tree for the post-natal visit outcome in 2013, which exhibits a very wide distribution of estimated CATEs and also a large portion of that distribution with negatively estimated (harmful) treatment effects.16 For this depth-3 tree (Fig. 6), the first node relies on the supply of nurses, and both the second depth nodes rely on the supply of traditional birth attendants. The third layer of the tree sees some influence of household characteristics (alcohol and tobacco expenditure and whether the head of household works in agriculture), with the remaining nodes relying on healthcare supply. Overall, this deeper tree may better reflect the influence of the supply of nurses, birth attendants and midwives on whether the programme effectively incentivised post-natal visits. Depth 3 trees for all outcomes and years are reported in the Appendix.Fig. 6. Depth-3 policy tree, Post-Natal Visits, 2013
Taken together and examined qualitatively, the policy tree results indicate a strong influence of the supply-side readiness of each village in terms of maximising desired maternal healthcare demand. The results also suggest that effects are important for those poorer households in urban locations and with fewer household amenities, indicating that the PKH programme was successful in incentivising those poorest participants. For example, in 2013 the bottom left decision node for the post-natal visits outcome assigns treatment to those who do not have clean water (no clean water \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\le 0$$\end{document} is false \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} treat). The inclusion of household consumption-expenditure in nodes of the deeper trees for 2009 aligns with the variable importance findings.
Discussion
In this paper, we used data on new mothers from a randomised experiment to evaluate the local average and heterogeneous effects of the PKH programme on various maternal health care utilisation outcomes in 2009 and 2013. We used a causal machine learning method, instrumental forests, to estimate heterogenous treatment effects (CLATEs), and aggregated these estimates over the entire sample population to produce a doubly robust approximation to the LATE.17 We also performed three types of complementary analysis of the drivers of heterogenous treatment effects: explored the best linear predictors of treatment effects, conducted a classification analysis, and built interpretable policy trees.
Our results largely support those from early evaluations on the overall average impacts of PKH on (compliant) new mothers, with increases in the probabilities of having a good assisted delivery in 2009 and 2013, attending at least four pre-natal check-ups in 2009, and attending at least two post-natal check-ups in 2009. However, the sizes of the effects tend to vary across studies, which can be explained by variations in study designs resulting in differences in covariate selection and identification of causal effects. Beyond average effects, the distribution of CLATEs provides evidence of heterogeneity in treatment effects such that although most mothers are expected to increase health care demand in response to the cash transfer based on their observed characteristics, others are less affected.
We acknowledge limitations of our analysis. Although a covariate balance check did not point to systematic differences between the treatment and control group, any imbalance on unobserved covariates would lead to possible attrition bias. In this work, we define treatment as a binary indicator for enrolment into PKH, and estimate conditional local average treatment effects accordingly. While the size of the cash transfer varies across households based on composition and eligibility, we do not model the treatment as continuous. Instead, we include covariates that influence transfer size as part of the forest estimation, allowing us to indirectly account for variation in programme generosity. A more direct approach would involve treating the transfer amount as a continuous treatment and estimating dose-response functions. This would require a different identification strategy, and is therefore beyond the scope of this paper. We leave this as a promising avenue for future research.
Previous evidence shows the supply of health care is the main source of regional variations and has a positive association with health care utilization (Finkelstein et al. 2016; Godøy and Huitfeldt 2020; Skinner 2011). To increase healthcare access by removing the demand-side barrier, using policies like the CCT programme, supply-side improvements are important to consider (Ensor and Cooper 2004; Gertler 2004; Gruber et al. 2014; Triyana 2016). Our analysis of drivers of treatment effect heterogeneity suggests that location and supply-side factors are important determinants of varying treatment effects for several outcomes. Urban-based households, where health care supply is more readily available, due to better proximity of medical facilities and a greater supply of practicing health care workers, are less likely to change their demand for maternal health care in response to the cash transfer. Other related variables, such as whether the household is located in Java and the nature of employment of the head-of-household, which is inherently linked to geographical factors, are also identified to be important predictors. We find that for the most part, the estimated regression coefficients from the BLP and the CLAN analysis are significant for one time period only, either 2009 or 2013, with only a few maintaining their significance throughout both periods, indicating a changing role of characteristics in programme effectiveness over time.
Our study may provide some insights into the factors affecting the duration and distribution of policy effects. The finding that PKH is unable to consistently maintain effectiveness beyond the short-term, if at all, could be explained by some reported issues in programme design and implementation (Kusuma et al. 2016). Administrative problems resulting in payment delays and missed payments altogether could partly explain the limited impact, combined with the fact that cash payments (as a proportion of household consumption) were essentially halved between 2007 and 2013, thus significantly reducing the incentive-based component of the policy. Our results also suggest that geographical factors that are inherently linked to health care supply are the primary drivers of treatment effect heterogeneity, rather than socioeconomic differences within the target population. The absence of a clear wealth gradient in treatment effects supports the program’s targeting strategy of focusing on extremely poor households in supply-ready areas, as expanding to less poor populations would unlikely yield systematically greater benefits. Instead, ensuring adequate healthcare infrastructure in program areas appears crucial for maximising effectiveness among intended beneficiaries. Although PKH aims to target poor households in supply-ready areas, residual differences in health care accessibility and availability seem to contribute to varying policy impacts. It has been argued that, in addition to supply-side readiness, other contextual differences, including cultural factors and supply-side barriers, can impact programme effectiveness (Glassman et al. 2013). For example, poor quality of care, transportation costs and a lack of health knowledge or programme awareness may restrict health care use irrespective of the value of the cash payment or the availability of health facilities in the local area (Gaarder et al. 2010). These findings suggest the need to better align demand-side policies with supply-side initiatives to support policy effectiveness.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bénard C, Josse J (2023) Variable importance for causal forests: breaking down the heterogeneity of treatment effects. ar Xiv preprint ar Xiv:2308.03369
- 2Cussens J, Hatamyar J, Shah V, Kreif N (2025) Fast learning of optimal policy trees. ar Xiv preprint ar Xiv:2506.15435
- 3Hines O, Diaz-Ordaz K, Vansteelandt S (2022) Variable importance measures for heterogeneous causal effects. ar Xiv preprint ar Xiv:2204.06030
- 4Kennedy EH (2020) Optimal doubly robust estimation of heterogeneous causal effects. https://arxiv.org/abs/2004.14497. Accessed: 2023-2-5
- 5Lechner M, Mareckova J (2022) Modified causal forest. ar Xiv preprint ar Xiv:2209.03744
- 6Lechner M, Mareckova J (2024) Comprehensive causal machine learning. ar Xiv preprint ar Xiv:2405.10198
- 7O’Neill, Weeks M (2018) Causal tree estimation of heterogeneous household response to time-of-use electricity pricing schemes. https://arxiv.org/abs/1810.09179. Accessed 5 Feb 2023
