Propensity score analysis with latent covariates: Measurement error bias   correction using the covariate's posterior mean, aka the inclusive factor   score

Trang Quynh Nguyen; Elizabeth A. Stuart

arXiv:1907.12709·stat.ME·February 13, 2020

Propensity score analysis with latent covariates: Measurement error bias correction using the covariate's posterior mean, aka the inclusive factor score

Trang Quynh Nguyen, Elizabeth A. Stuart

PDF

Open Access

TL;DR

This paper proposes a new method using the inclusive factor score to correct measurement error bias in propensity score analysis involving latent covariates, improving causal effect estimation accuracy.

Contribution

It introduces the inclusive factor score as an improved proxy for latent covariates in propensity score analysis, with theoretical support and empirical validation.

Findings

01

The inclusive factor score reduces bias in causal effect estimates.

02

Balancing the inclusive factor score improves the moments of the latent covariate.

03

Simulation and real data demonstrate the method's effectiveness.

Abstract

We address measurement error bias in propensity score (PS) analysis due to covariates that are latent variables. In the setting where latent covariate $X$ is measured via multiple error-prone items $W$ , PS analysis using several proxies for $X$ -- the $W$ items themselves, a summary score (mean/sum of the items), or the conventional factor score (cFS , i.e., predicted value of $X$ based on the measurement model) -- often results in biased estimation of the causal effect, because balancing the proxy (between exposure conditions) does not balance $X$ . We propose an improved proxy: the conditional mean of $X$ given the combination of $W$ , the observed covariates $Z$ , and exposure $A$ , denoted $X_{W Z A}$ . The theoretical support, which applies whether $X$ is latent or not (but is unobserved), is that balancing $X_{W Z A}$ (e.g., via weighting or matching) implies…

Tables2

Table 1. Table 1: Balance of baseline covariates between exposed (n=140) and unexposed (n=277) groups (1) before PS weighting, and after PS weighting based on the latent covariates’ (2) mean scores and (3) iFSs. SMD = standardized mean difference.

	Exposed	Unexposed group		Unexposed group after PS weighting
	group	before PS weighting		based on mean scores		and based on iFSs
	mean (%)	mean (%)	SMD	mean (%)	SMD	mean (%)	SMD
Observed covariates
Age	15.9	16.1	-0.16	15.9	0.02	15.8	0.08
Race (%)
White	62.9	64.6	-0.04	64.3	-0.03	66.7	-0.08
Black/African-American	33.6	27.8	0.13	32.6	0.02	30.2	0.07
Native American	7.9	4.7	0.14	8.7	-0.03	8.4	-0.02
Asian	2.1	3.6	-0.08	1.4	0.05	1.1	0.08
Hispanic ethnicity (%)	8.6	10.8	-0.08	12.0	-0.11	11.5	-0.10
Parent education (%)
Less than high school	18.6	17.0	0.04	21.2	-0.06	22.8	-0.10
High school	38.6	26.4	0.27	35.8	0.06	35.9	0.05
Business/vocational training	15.0	11.9	0.09	13.4	0.05	13.4	0.05
Some college (not graduated)	12.1	25.6	-0.33	11.4	0.02	10.4	0.06
College graduate or higher	15.7	19.1	-0.09	18.2	-0.06	17.5	-0.05
Parent marital status (%)
Married	59.3	66.1	-0.14	56.7	0.05	56.8	0.05
Single	15.0	4.3	0.40	16.5	-0.04	15.7	-0.02
Widowed	3.6	4.3	-0.04	3.2	0.02	2.8	0.05
Divorced	15.0	20.6	-0.14	16.0	-0.03	16.9	-0.05
Separated	7.1	4.7	0.11	7.6	-0.02	7.9	-0.03
Proxies of latent covariates
Violence
Mean score (range 0-3)	0.65	0.43	0.39	0.66	-0.02	0.70	-0.09
Inclusive factor score	-0.86	-1.32	0.55	-0.94	0.08	-0.84	-0.03
Academic achievement
Mean score (range 1-4)	1.18	1.52	-0.50	1.17	0.01	1.02	0.23
Inclusive factor score	-0.21	0.40	-0.75	0.01	-0.28	-0.20	-0.01

Table 2. Table 2: Changes in causal estimates as a result of measurement error correction (using the iFS method) for latent covariates. Proportions and effects are on percent scale.

		WEIGHTING-ONLY ESTIMATOR			WEIGHTING-PLUS ESTIMATOR
	outcome	weighted			mean predicted
	proportion	outcome			potential outcome
	in the	proportion	ACEE	95%	probability under	ACEE	95%
	exposed	in the	point	confidence	non-exposure	point	confidence
	group	unexposed	estimate	interval	for the exposed	estimate	interval
neither corrected	70.7	59.7	11.0	(3.1, 18.3)	59.1	11.6	(3.7, 18.5)
violence corrected	70.7	61.1	9.7		60.5	10.3
acad. achiev. corrected	70.7	62.5	8.2		61.4	9.3
both corrected	70.7	63.2	7.5	(-0.8, 15.9)	62.2	8.6	(1.7, 18.2)

Equations82

E [G (X - X_{W Z A}) ∣ A = 1] = E [G (X - X_{W Z A}) ∣ A = 0] = 0.

E [G (X - X_{W Z A}) ∣ A = 1] = E [G (X - X_{W Z A}) ∣ A = 0] = 0.

E [X - X_{W Z A} ∣ K, A = 1] = E [X - X_{W Z A} ∣ K, A = 0] = 0.

E [X - X_{W Z A} ∣ K, A = 1] = E [X - X_{W Z A} ∣ K, A = 0] = 0.

E [A QX] = E [(1 - A) QX] = E [X] .

E [A QX] = E [(1 - A) QX] = E [X] .

E [X ∣ Z, X_{W Z A}, A = 1] = E [X ∣ Z, X_{W Z A}, A = 0], and

E [X ∣ Z, X_{W Z A}, A = 1] = E [X ∣ Z, X_{W Z A}, A = 0], and

E [X ∣ e (Z, X_{W Z A}), A = 1] = E [X ∣ e (Z, X_{W Z A}), A = 0] .

E [X ∣ e (Z, X_{W Z A}), A = 1] = E [X ∣ e (Z, X_{W Z A}), A = 0] .

E [Y^{(a)} ∣ Z, X] = β_{0} + β_{a} a + β_{z} (Z) + β_{z a} (Z) a + β_{x} X + β_{x a} X a \leavevmode \leavevmode for a = 0, 1.

E [Y^{(a)} ∣ Z, X] = β_{0} + β_{a} a + β_{z} (Z) + β_{z a} (Z) a + β_{x} X + β_{x a} X a \leavevmode \leavevmode for a = 0, 1.

θ = λ_{x} X + λ_{z} Z + Λ_{s} S, \leavevmode \leavevmode S \sim N (0, I), \leavevmode \leavevmode S ⊥ ⊥ X, Z, A,

θ = λ_{x} X + λ_{z} Z + Λ_{s} S, \leavevmode \leavevmode S \sim N (0, I), \leavevmode \leavevmode S ⊥ ⊥ X, Z, A,

\frac{\sum A Q ^ Y}{\sum A Q ^} - \frac{\sum ( 1 - A ) Q ^ Y}{\sum ( 1 - A ) Q ^} .

\frac{\sum A Q ^ Y}{\sum A Q ^} - \frac{\sum ( 1 - A ) Q ^ Y}{\sum ( 1 - A ) Q ^} .

Q_{0} = A [e (Z, X)]^{- 1} + (1 - A) [1 - e (Z, X)]^{- 1} .

Q_{0} = A [e (Z, X)]^{- 1} + (1 - A) [1 - e (Z, X)]^{- 1} .

Q_{0} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X + β_{z} Z)] .

Q_{0} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X + β_{z} Z)] .

Q_{1} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X_{MLE} + β_{z} Z)] \cdot exp [- β_{x}^{2} \cdot var (X - X_{MLE} ∣ X, Z, A) /2],

Q_{1} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X_{MLE} + β_{z} Z)] \cdot exp [- β_{x}^{2} \cdot var (X - X_{MLE} ∣ X, Z, A) /2],

Q_{1} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X^{*} + β_{z} Z)],

Q_{1} = 1 + exp [(1 - 2 A) (β_{0} + β_{x} X^{*} + β_{z} Z)],

ACEE_{weighting-only} = \frac{\sum _{i = 1}^{n} A _{i} Y _{i}}{\sum _{i = 1}^{n} A _{i}} - \frac{\sum _{i = 1}^{n} ( 1 - A _{i} ) Q ^ _{i} Y _{i}}{\sum _{i = 1}^{n} ( 1 - A _{i} ) Q ^ _{i}} .

ACEE_{weighting-only} = \frac{\sum _{i = 1}^{n} A _{i} Y _{i}}{\sum _{i = 1}^{n} A _{i}} - \frac{\sum _{i = 1}^{n} ( 1 - A _{i} ) Q ^ _{i} Y _{i}}{\sum _{i = 1}^{n} ( 1 - A _{i} ) Q ^ _{i}} .

ACEE_{weighting-plus} = \frac{\sum _{i = 1}^{n} A _{i} Y _{i}}{\sum _{i = 1}^{n} A _{i}} - \frac{\sum _{i = 1}^{n} A _{i} Y ^ _{i}^{(0)}}{\sum _{i = 1}^{n} A _{i}} .

ACEE_{weighting-plus} = \frac{\sum _{i = 1}^{n} A _{i} Y _{i}}{\sum _{i = 1}^{n} A _{i}} - \frac{\sum _{i = 1}^{n} A _{i} Y ^ _{i}^{(0)}}{\sum _{i = 1}^{n} A _{i}} .

E [X - X_{W Z A} ∣ Z, X_{W Z A}, A]

E [X - X_{W Z A} ∣ Z, X_{W Z A}, A]

= E {0 E [X - X_{W Z A} ∣ W, Z, A] ∣ Z, X_{W Z A}, A} = 0.

E [X - X_{W Z A} ∣ A] = E {= 0 \leavevmode \nobreak by (1) E [X - X_{W Z A} ∣ Z, X_{W Z A}, A] ∣ A} = 0.

E [X - X_{W Z A} ∣ A] = E {= 0 \leavevmode \nobreak by (1) E [X - X_{W Z A} ∣ Z, X_{W Z A}, A] ∣ A} = 0.

E [X - X_{W Z A}] = E {= 0 by above line E [X - X_{W Z A} ∣ A]} = 0.

E [g (Z, X_{W Z A}, A) (X - X_{W Z A}) ∣ A]

E [g (Z, X_{W Z A}, A) (X - X_{W Z A}) ∣ A]

= E {g (Z, X_{W Z A}, A) = 0 \leavevmode \nobreak by (1) E [X - X_{W Z A} ∣ Z, X_{W Z A}, A] ∣ A}

= 0 \leavevmode \leavevmode for bounded g (Z, X_{W Z A}, A) .

E [X - X_{W Z A} ∣ k

E [X - X_{W Z A} ∣ k

= E {E [X - X_{W Z A} ∣ Z, X_{W Z A}, k (Z, X_{W Z A}), A] ∣ k (Z, X_{W Z A}), A}

= E {= 0 \leavevmode \nobreak by (1) E [X - X_{W Z A} ∣ Z, X_{W Z A}, A] ∣ k (Z, X_{W Z A}), A} = 0.

E [A QX]

E [A QX]

= E [\frac{A}{P ( A = 1∣ X _{W Z A} )} X_{W Z A}] + = 0 by part (3) of Theorem E [Q (X - X_{W Z A}) ∣ A = 1] P (A = 1)

= E {E [\frac{A}{P ( A = 1∣ X _{W Z A} )} X_{W Z A} ∣ X_{W Z A}]}

= E [\frac{E [ A ∣ X _{W Z A} ]}{P ( A = 1∣ X _{W Z A} )} X_{W Z A}]

= E [X_{W Z A}]

= E [X] - = 0 by part (2) of Theorem E [X - X_{W Z A}] = E [X] .

E [X ∣ Z, X_{W Z A}, A]

E [X ∣ Z, X_{W Z A}, A]

E [X ∣ e (Z, X_{W Z A}), A]

= E [X_{W Z A} ∣ e (Z, X_{W Z A})],

E [ϵ^{(a)} ∣ Z, X, A] = E [ϵ^{(a)} ∣ Z, X] = 0.

E [ϵ^{(a)} ∣ Z, X, A] = E [ϵ^{(a)} ∣ Z, X] = 0.

Q ⊥ ⊥ ϵ^{(a)} ∣ Z, X, A .

Q ⊥ ⊥ ϵ^{(a)} ∣ Z, X, A .

E [Q Y ∣ A = a] = E [Q Y^{(a)} ∣ A = a]

E [Q Y ∣ A = a] = E [Q Y^{(a)} ∣ A = a]

= E {Q [β_{0} + β_{a} a + β_{z} (Z) + β_{z a} (Z) a + β_{x} X + β_{x z} X a + ϵ^{(a)}] ∣ A = a}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Psychometric Methodologies and Testing · Reliability and Agreement in Measurement

Full text

\epstopdfDeclareGraphicsRule

.tifpng.pngconvert #1 \OutputFile \AppendGraphicsExtensions.tif

\authornoteWe are indebted to the three anonymous Reviewers and the Editor for pointing us to important related work, and for their insightful comments and questions, which helped us evolve in our understanding of the problem and the proposed method, leading to substantial improvements in the content and clarity of the paper. TQN thank Daniel Scharfstein, Ilya Shpitser and Betsy Ogburn for their reactions to this problem, which are illuminating and helpful; and thank Abhirup Datta for indulging her struggle with probability and the $Q$ function. This work is supported by grant R01MH099010 (PI Stuart) from the National Institute of Mental Health. Contact: [email protected].

Propensity score analysis with latent covariates: Measurement error bias correction using the covariate’s posterior mean, aka the inclusive factor score

Trang Quynh Nguyen

Elizabeth A. Stuart

Johns Hopkins Bloomberg School of Public Health

accepted by Journal of Educational and Behavioral Statistics

(with supplementary material)

Abstract

We address measurement error bias in propensity score (PS) analysis due to covariates that are latent variables. In the setting where latent covariate $X$ is measured via multiple error-prone items $\bm{W}$ , PS analysis using several proxies for $X$ – the $\bm{W}$ items themselves, a summary score (mean/sum of the items), or the conventional factor score (cFS , i.e., predicted value of $X$ based on the measurement model) – often results in biased estimation of the causal effect, because balancing the proxy (between exposure conditions) does not balance $X$ . We propose an improved proxy: the conditional mean of $X$ given the combination of $\bm{W}$ , the observed covariates $Z$ , and exposure $A$ , denoted $X_{WZA}$ . The theoretical support is that balancing $X_{WZA}$ (e.g., via weighting or matching) implies balancing the mean of $X$ . For a latent $X$ , we estimate $X_{WZA}$ by the inclusive factor score (iFS) – predicted value of $X$ from a structural equation model that captures the joint distribution of $(X,\bm{W},A)$ given $Z$ . Simulation shows that PS analysis using the iFS substantially improves balance on the first five moments of $X$ and reduces bias in the estimated causal effect. Hence, within the proxy variables approach, we recommend this proxy over existing ones. We connect this proxy method to known results about weighting/matching functions [McCaffrey2013, Lockwood2016]. We illustrate the method in handling latent covariates when estimating the effect of out-of-school suspension on risk of later police arrests using Add Health data.

keywords:

measurement error, covariate measurement error, latent variable, propensity score, factor score, inclusive factor score, bias correction, weighting function, matching function

1 Introduction

Propensity score (PS) methods are useful for estimating the effect of an exposure (or treatment) on an outcome using observational data, assuming that covariates that confound their relationship are observed. The PS – the probability of exposure given covariates – is a balancing score, meaning conditional on it (e.g., via matching or weighting), there is balance between exposure conditions on the covariates, thus removing confounding by them [Rosenbaum1983a]. Implicit in this no unobserved confounding assumption is the assumption that covariates are measured without error; measurement error often leads to bias in the estimated causal effect [Jakubowski2015, Pearl2010, Raykov2012c, Steiner2011, Yi2012]. In PS analysis, covariate measurement error biases the estimation of the PS, resulting in residual imbalance of the true covariate (after matching or weighting), which may lead to bias in the estimated causal effect.

We address measurement error bias when a confounder is a latent continuous variable with multiple error-prone measurements.111This is not the situation where the confounder is the measurement of a latent variable (e.g., intervention decisions in education are based on the teacher’s rating of the child’s academic aptitude), in which case there is no measurement error bias (the rating is perfectly captured) [Pohl2016]. (Here “latent variable” is a construct considered real but is not directly observable, e.g., readiness to learn.) This measurement error type is ubiquitous in health and education. For example, in PS analysis evaluating the effect of retention in first grade on future academic achievement, [Wu2008] incorporated covariates such as academic aptitude, personality traits, social adjustment, peer relations, and family adversity – measured via multi-item scales. Studying the effect of adolescent cannabis use on adulthood depression, [Harder2008a] PS-matched users and non-users on measures of concentration difficulty, behavior problems, shyness, depression, anxiety, and parental supervision and monitoring, among others.

1.1 The proxy variable approach: summary scores and factor scores

In PS analysis practice, perhaps the most common (and simplest) way to deal with a latent covariate with multiple measurement items is to use a summary score (usually the sum or mean of the items, also called scale score if the items are from an established scale) to represent the latent variable, and treat it as an observed covariate. Since the correlation between the summary score and the true latent variable is less than 1, there is measurement error. If the latent variable is an important confounder, such measurement error may result in appreciable bias in the estimated causal effect [Steiner2011].

For regression analysis where the interest is in estimating regression coefficients, a popular solution to this measurement error bias problem is latent variable modeling [Bollen1989]: establishing a measurement model for the latent variable and then fitting a structural equation model (SEM) that combines the measurement component and the regression component with the latent and observed predictors. This does not help PS analysis, however, because although the exposure assignment model may be fit as a SEM, PS computation for each individual requires input values for all predictors, including the latent one. If we wish to use the standard procedure – estimating the exposure assignment model (the PS model), computing the PS, then matching or weighting based on that PS – we need a proxy for the latent variable, and desirably a better one than the summary score.

[Raykov2012c] suggested using the factor score (FS) from the measurement model as a proxy for the latent variable in estimating the PS. This FS is the model predicted value of the latent variable given the measurement items. We will call it the conventional FS (abbreviated cFS), as the term factor scores originally referred to values generated from measurement models (called factor models). [Raykov2012c] argued, intuitively, that since the cFS better represents the latent covariate, adjusting for it and the PS based on it should produce less bias than adjusting for the measurement items and the PS based on them. Later [Jakubowski2015] used simulation to compare the use of the cFS (in PS matching) to using the measurement items directly. While the author’s conclusions seem to favor the cFS, our reading of the simulation results is that these two methods have similar bias.

With the benefit of hindsight, this similar bias result is not surprising, as the two methods essentially capture the same information about the latent variable. Related to this point, [Bollen1989] commented that the estimated FS is a weighted combination of the measurement items, and as such does not remove measurement error. Also, from a missing data perspective, the cFS can be seen as an imputation for the latent covariate, whose imputation model relies only on the measurement items. This is a mismatch with the PS model, which relies on the exposure-confounders joint distribution; and it is well known that incompatibility of imputation and analysis models results in bias [Meng1994].

We investigate another FS that improves upon the cFS as proxy for the latent variable: the FS generated from a SEM that combines the measurement component and the exposure assignment model, thus is informed by the exposure-confounders joint distribution. We call it the inclusive FS (abbreviated iFS), borrowing the descriptor inclusive from [Collins2001b], who refer to missing data methods that make use of auxiliary variables as inclusive.222Strictly speaking, from an imputation perspective, the exposure variable and the other confounders are part of the analysis model and thus are not auxiliary. But from a conventional measurement perspective, with FSs having originated from factor analysis, these variables are auxiliary to the measurement model. The iFS estimates the conditional expectation of the latent variable given the measurements, the other confounders, and the assigned exposure. We show that the iFS outperforms the cFS and the summary score in helping balance covariates in PS analysis, reducing bias in the estimated causal effect.

As the purpose of a better proxy is to improve covariate balance, its estimation belongs in the design of the observational study, which covers “all contemplating, collecting, organizing, and analysing of data that takes place prior to seeing any outcome data” [Rubin2007]. Hence we do not use the outcome to inform the proxy variable. This puts our method in a different class from methods that rely solely on modeling the outcome and methods that solve the joint distribution or covariance matrix of all the observed variables for a causal effect formula [Pearl2010, Kuroki2014, Cai2008].

1.2 The weighting/matching function approach

The proxy variable approach has an intuitive and simple appeal – it makes sense to seek a proxy variable (a function of observed data) to stand in for just the one variable we do not observe, and once the proxy is obtained, we proceed with analysis as usual. A different approach, the weighting/matching function approach [McCaffrey2013, Lockwood2016], does not seek proxies for the unobserved variable. Instead the focus is on functions of observed (and possibly simulated) data that when used for weighting or matching result in unbiased estimation of the causal effect. While the proxy variable approach seeks a proxy for the latent variable that results in a better weighting/matching function, the weighting/matching function approach directly seeks a good weighting/matching function. Conceptually, the latter considers a larger space for weighting/matching functions, while the former places restrictions on this space, admitting only weighting/matching functions that follow from proxy variables for the latent variable. The weighting/matching function approach in principle may result in solutions that are more correct, but it is also generally more complicated. The current paper takes the proxy variable approach, aiming to offer a method that is easy to implement for researchers who are not statisticians. We will compare the iFS proxy method to the weighting/matching function approach, focusing on special cases where closed form solutions exist for the latter.

1.3 Data example

Method illustration will be based on an analysis using data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) [Harris2013] to evaluate whether out-of-school suspension in adolescence increases the risk of subsequent problems with the law, specifically being arrested by law officers. Since the exposure was not randomized, we use PS weighting to balance the exposed and unexposed groups on a set of baseline covariates considered potential confounders of the exposure-outcome association. These include the latent constructs academic achievement (measured by grades for several subjects) and violence (measured by items about physical fights and weapon use).

Next in this paper, Section 2 introduces the setting and assumptions. Section 3 provides the theoretical rationale for the proposed proxy variable. Section 4 discusses its identification and estimation. Section 5 presents simulations comparing the iFS proxy to non-inclusive proxies, with correctly specified models. Section 6 relates this method to weighting and matching functions. Section 7 evalutes the iFS proxy with misspecified models. Section 8 presents real data illustration. Section 9 closes with a discussion.

2 Setting, Notation and Key Assumptions

In an observational study, $A$ is a binary exposure, $Y^{(1)}$ and $Y^{(0)}$ the potential outcomes [Rubin1974] under exposure and non-exposure, and $Y$ the observed outcome; $Y=AY^{(1)}+(1\!-\!A)Y^{(0)}$ . The causal effect of the exposure for individual $i$ is $Y_{i}^{(1)}\!-Y_{i}^{(0)}$ . Often an average effect is of interest, e.g., the average causal effect (ACE), $\text{E}[Y^{(1)}\!-Y^{(0)}]$ , or the average causal effect on the exposed (ACEE), $\text{E}[Y^{(1)}\!-Y^{(0)}\mid A=1]$ . For simplicity, we take the estimand to be the ACE, but the proposed method applies to either estimand.

We make the usual causal inference assumptions: SUTVA (i.e., no interference and treatment variation irrelevance) [Rubin1980]; unconfoundedness [Imbens2008] given the combination of observed ( $Z$ ) and latent ( $X$ ) covariates (i.e., $A\perp\!\!\!\perp Y^{(a)}\mid Z,X$ for $a=0,1$ ); and positivity (i.e., $0<\text{P}(A\!=\!1|Z,X)<1$ ). $Z$ and $X$ are often multivariate (we use univariate notation for simplicity) and may be associated due to common causes.

Had $X$ been observed, we would have been able to work directly with $Z,X$ to balance these variables between the two exposure conditions. This could be done by estimating the PS for each unit based on these variables, $e(Z,X)=\text{P}(A=1|Z,X)$ , and reweighting the units using inverse probability weights $A[e(Z,X)]^{-1}+(1\!-\!A)[1\!-\!e(Z,X)]^{-1}$ – so that both the exposed and unexposed groups mimic the covariate distribution of the full sample.333If the estimand is the ACEE, odds weighting is used instead: exposed units are unweighted, and unexposed units are weighted by the odds of exposure assignment given their covariate values. Covariate balance may also be obtained through matching on $(Z,X)$ or on $e(Z,X)$ . Once balance is obtained, the difference in mean outcome between the two exposure conditions is a valid estimate of the causal effect. Alternatively, covariate balancing may be combined with regression adjustment. Our current focus is covariate balancing, but since combination with regression adjustment is common in practice, we will comment on this where relevant.

However, $X$ is a latent variable. Instead of $X$ , we observe K measurement items, $\bm{W}=(W_{1},\dots,W_{\text{K}})$ , which are functions of $X$ and noise. As we are considering FSs, which are estimates of conditional expectations of $X$ (clarified shortly), it is convenient to abbreviate conditional expectations. Let $X_{W}=\text{E}[X|\bm{W}]$ , $X_{WZA}=\text{E}[X|\bm{W},Z,A]$ , etc. There is one PS version for each proxy of $X$ . Using the usual PS notation $e(\cdot)=\text{P}(A=1|\cdot)$ , the PSs estimated based on the cFS, the iFS, the summary score, and all measurement items, are estimates of $e(Z,X_{W})$ , $e(Z,X_{WZA})$ , $e(Z,\overline{W})$ and $e(Z,\bm{W})$ , respectively.

Some restrictions are needed on $\bm{W}$ to avoid contradicting the unconfoundedness assumption. Following [Lockwood2016], we assume strong surrogacy, i.e., conditional on the covariates that satisfy unconfoundedness, $\bm{W}$ is independent of exposure assignment and potential outcomes, $\bm{W}\perp\!\!\!\perp A,Y^{(a)}\mid Z,X$ . Fig. 1 represents this setting. (In the special case where $\bm{W}$ are sums of $X$ and error terms, this means the error terms may depend on covariates $X,Z$ , but conditional on these variables do not carry any information about exposure assignment and potential outcomes. This includes the even more special case of classical measurement error, where error terms are independent of $Z,X,A,Y^{(1)},Y^{(0)}$ .) While $\bm{W}$ is seen primarily as measurement of $X$ , we do not rule out possible association between $\bm{W}$ and the other covariates $Z$ conditional on $X$ .

We keep with strong surrogacy in this paper for simplicity of presentation, but note that the proposed method also applies to the case where some $\bm{W}$ items have a direct effect on exposure assignment, where only weak surrogacy is satisfied, i.e., $\bm{W}\perp\!\!\!\perp Y^{(a)}\mid Z,X,A$ . Application of the iFS method to this case only involves modifying the model used to generate the iFS to reflect that direct effect and requires that there is sufficient conditional independence for the model to be identified (explained later). Our approach is also relevant if $A$ influences some $\bm{W}$ items, which is another case of weak surrogacy. However, this case has complicated temporal order (post-exposure measurement of pre-exposure covariate), which deserves separate consideration, so we exclude it from this paper.

Note that both strong and weak surrogacy restricts $\bm{W}$ to be independent of $Y^{(a)}$ given $Z,X$ or $Z,X,A$ , even though this is not required to maintain unconfoundedness. This restriction is important, however, as it protects us from inducing confounding via $\bm{W}$ due to collider bias when matching or weighting on functions of $(\bm{W},Z,A)$ .

While additional assumptions will be needed for identification and estimation of the proxy variable of interest, we put off discussing them until later in the paper. Our first step is to consider the theoretical support for targeting $X_{WZA}$ as proxy for $X$ .

3 Theoretical Support for $X_{WZA}$ as Proxy for $X$ for Covariate Balancing

To judge whether $X_{WZA}$ is a worthwhile target, imagine that we do observe $X_{WZA}$ and see what follows. If $X_{WZA}$ is observed, we can use PS weighting or matching to balance $(Z,X_{WZA})$ . But then what happens to the unobserved residual $(X\!-\!X_{WZA})$ ? As shown in the theorem below (see proof in the Appendix), this residual has some nice properties that protects its mean-balance, i.e., the equality of its means between exposure conditions.

Theorem.

Let $X,Z,\bm{W}$ be random variables with finite variances. Let $A$ be a binary (0/1) random variable. Denote $\textup{E}[X|\bm{W},Z,A]$ by $X_{WZA}$ . Then

$\textup{E}[X\!-\!X_{WZA}\mid Z,X_{WZA},A]=0$ . 2. 2.

$\textup{E}[X\!-\!X_{WZA}\mid A=1]=\textup{E}[X\!-\!X_{WZA}\mid A=0]=\textup{E}[X\!-\!X_{WZA}]=0$ . 3. 3.

For any positive bounded scalar function $G=g(Z,X_{WZA},A)$ ,

[TABLE] 4. 4.

For any function $K=k(Z,X_{WZA})$ (possibly vector-valued),

[TABLE]

The gist of this result is that $(X\!-\!X_{WZA})$ has mean zero conditional on any set of values for $(Z,X_{WZA},A)$ . It thus has mean-balance in expectation, i.e., its expectation is zero in both the exposed and unexposed group, and this mean-balance remains after weighting by bounded functions of $(Z,X_{WZA},A)$ or after matching based on functions of $(Z,X_{WZA})$ , because all the relevant conditional means of $(X\!-\!X_{WZA})$ are zero. The important implication of this result is stated in the Corollary below (see proof in the Appendix).

Corollary 1.

In the setting of Theorem, denote $\textup{P}(A=1|Z,X_{WZA})$ by $e(Z,X_{WZA})$ .

Assume $0\!<\!e(Z,X_{WZA})\!<\!1$ . Let $Q=A/e(Z,X_{WZA})+(1\!-\!A)/[1\!-\!e(Z,X_{WZA})]$ . Then

[TABLE] 2. 2.

For $Z,X_{WZA}$ such that $0<e(Z,X_{WZA})<1$ ,

[TABLE]

Essentially, Corollary 1 says that weighting444The specific positivity assumption with respect to $(Z,X_{WZA})$ is required because positivity with respect to $(Z,X)$ does not necessarily translate to positivity for $(Z,X_{WZA})$ . based on $e(Z,X_{WZA})$ , and matching on $(Z,X_{WZA})$ or on $e(Z,X_{WZA})$ – or on any one-to-one function of $e(Z,X_{WZA})$ – help obtain balance on the mean of $X$ in expectation, in addition to balance on the distribution of the observed covariates $Z$ . This is because such weighting/matching obtains balance on the distribution of $X_{WZA}$ while the mean-balance of $(X\!-\!X_{WZA})$ is preserved.

That balancing $(Z,X_{WZA})$ does not worsen mean-balance on $(X\!-\!X_{WZA})$ is similar to a known result: matching on the linear predictor $\beta^{\prime}V$ (based on covariates $V$ ) of the PS does not create bias on linear combinations of $V$ that are uncorrelated with $\beta^{\prime}V$ if the distributions of $V$ given $A=1$ and $A=0$ are normal [Rubin1992b], elliptical [Rubin1992], or certain mixtures of elliptical distributions [Rubin2006]. Our finding for $(X\!-\!X_{WZA})$ , however, does not require distributional assumptions, and only relies on the fact that $(X\!-\!X_{WZA})$ has mean zero given $(Z,X_{WZA},A)$ .

Note that this result is specific to $X_{WZA}$ . If we replace $e(Z,X_{WZA})$ with $e(Z,X_{W})$ , or $e(Z,\overline{W})$ , or $e(Z,\bm{W})$ , or $e(Z,X_{WZ})$ , we no longer have this nice result for mean-balance on $X$ . This shows that $X_{WZA}$ is superior as proxy for $X$ for the purpose of balancing $X$ .

This result entails unbiased causal effect estimation in a special case:

Corollary 2.

In the setting of Theorem and Corollary 1, let the variables assume the causal structure with the assumptions from the Setting, Notation and Key Assumptions section. Also, assume that the outcome model is of the form

[TABLE]

Then balancing the distribution of $Z$ and the mean of $X$ – via weighting by function $Q$ or matching on $(Z,X_{WZA})$ or on $e(Z,X_{WZA})$ – leads to unbiased estimation of the ACE.

This corollary (see proof in the Appendix) says that balancing the mean of $X$ (and the distribution of $Z$ ) results in unbiased ACE estimation if the outcome model is linear in $X$ in each exposure condition, and $X$ and $Z$ do not interact on the outcome. This is analogous to the regression calibration result that the conditional mean of an error-prone predictor given the other predictors helps recover a linear model’s coefficients [Carroll2006]. (This means, if one wishes to combine covariate balancing with regression adjustment, $X_{WZA}$ may be a reasonable choice for that purpose.)

Real world data almost surely do not belong in this special case, so balancing the mean of $X$ does not imply unbiased ACE estimation (which generally requires balance of covariate distributions, not just means). However, improved balance (relative to when using inferior proxy variables) is likely to reduce bias. We will show this in simulation studies.

4 Identification and Estimation of $X_{WZA}$

The challenge of working with latent variables is that the distribution of a latent variable is not nonparametrically identified. There are infinitely many candidates for variable $X$ that relate to the observed variables as depicted in Fig. 1. To make progress, some assumptions are required about unobserved components of the joint distribution of $X,\bm{W},Z,A$ . To be clear, these assumptions essentially define the variable $X$ being considered in analysis [Borsboom2003]. For example, an $X$ candidate that is assumed to influence $\bm{W}$ in a linear manner is a different variable from one assumed to influence $\bm{W}$ in a nonlinear manner. We draw selectively from assumptions common in latent variable modeling, but acknowledge this indeterminacy – a point we revisit later.

A minor technical point before diving into the assumptions: As a latent variable does not have an inherent scale and location, in latent variable modeling scale and location are provided [Kline2016] usually by (1) fixing the latent variable’s mean (or intercept given predictors), and either (2a) fixing its variance (or conditional variance given predictors) or (2b) fixing its slope coefficient in the model of one measurement item (aka a factor loading). The two options provide different scales for $X$ , but both work, because what scale a covariate is on does not matter when the purpose is controlling for it. We use (1) and (2a), with the conventional values 0 for mean/intercept and 1 for variance.

As the purpose is to estimate $X_{WZA}$ , we need to fit a model that contains enough information about the distribution of $X$ given $(\bm{W},Z,A)$ that allows extracting $X_{WZA}$ . The assumptions we adopt include functional form and distributional assumptions for variables in the model, and conditional independence assumptions to limit the number of parameters.

4.1 Sufficient conditional independence assumption

The model for $(\bm{W},Z,X,A)$ implied by Fig. 1 is unidentified because the $Z\leftrightarrow X\to\bm{W}$ and $Z\leftrightarrow\bm{W}$ paths compete to explain the association of $Z$ and $\bm{W}$ , i.e., there are too many parameters. The simplest, and strictest, assumption to deal with this is full conditional independence: conditional on $X$ , $\bm{W}$ items are independent of one another and independent of $Z$ . Under this assumption, the model is identified with a minimum of two $\bm{W}$ items.555This is related to the three-indicator rule of factor model identification [Bollen1989], as $(Z,A)$ , by their association with $X$ , acts as the third indicator. While including multiple variables, $(Z,A)$ are equivalent to one indicator only, because the $(Z,X,A)$ part of the model is saturated, with no conditional independence.

With more $\bm{W}$ items, this assumption can be relaxed, allowing for a limited number of conditional dependence relationships among $\bm{W}$ items, or between $\bm{W}$ items and $Z$ . These relations, often known as residual covariances and direct effects in factor analysis lingo, are sometimes found through careful fitting of the measurement model [Byrne2013structural] and need to be dealt with in analyses using the latent variable. Therefore, instead of full conditional independence, we only require sufficient conditional independence, that is, there are enough conditional independence relations among $\bm{W}$ items, and between $\bm{W}$ items and $Z$ , for the model to be identified. Essentially, if there are some conditional dependence relations, more $\bm{W}$ items are required, and conversely, the fewer $\bm{W}$ items there are, the fewer conditional dependence relations are allowed – see [Bollen1989] and [Kline2016] on identifiability of models with latent variables. Fig. 2 differentiates full and sufficient conditional independence, where the dotted arrows represent possible conditional dependence.

We caution that while conditional dependence should be accommodated when it is believed to exist, it should be treated as an exception, not as the rule. While a model may be identified with just enough conditional independence and with plenty of conditional dependence, it may imply that $\bm{W}$ carries more information about $Z$ , or about other latent variables (that account for their residual dependence) than about $X$ . That would contradict the idea of $\bm{W}$ being measurements of $X$ .

4.2 Functional form and distributional assumptions

We discuss these assumptions with respect to three components of the $(Z,X,A,\bm{W})$ joint distribution: the covariates model, the measurement model, and the exposure assignment model. Throughout, all parameters are unknown and need to be estimated from data. Mplus [Muthen2017] code is provided (in the Supplementary Material) for implementing this joint model; we use Mplus as it computes FSs based on SEMs.

The covariates model. The assumed model so far (Fig. 2b) has a curved arrow between $Z$ and $X$ . To avoid having to model $Z$ , which in most applications is a mix of different types of variables, we take the distribution of $Z$ as it is in the observed data, and model $X$ on $Z$ . This effectively means that the SEM that is fit replaces the curved arrow with a directional arrow from $Z$ to $X$ (see Fig. 3). We are not assuming a causal effect of $Z$ on $X$ , however. This is just a convenient modeling choice.

In latent variable modeling practice, a routinely made assumption is that a latent continuous variable is normally distributed, marginally or conditional on predictors. The choice to model $X$ on $Z$ means the model assumes $X$ is normal given $Z$ , which allows $X$ to inherit non-normality from $Z$ . This is preferable because there is usually no substantive reason to believe that $X$ is marginally normal. Also, it seems appropriate because any non-normality in $Z$ may reflect non-normality in the common causes of $Z$ and $X$ which may also influence the distribution of $X$ . The covariates model is $X|Z\sim\text{N}(\alpha_{z}Z,\leavevmode\nobreak\ 1)$ . If there are more than one latent covariate (as in our application), they are treated as multivariate normal given $Z$ . The SEM in Fig. 3 would include, say, $X_{1}$ and $X_{2}$ , each with its measurement model, and each relating to $Z$ and $A$ , with a curved arrow representing their covariance given $Z$ . For simplicity, we mostly refer to $X$ in singular terms.

While conditional normality is more flexible than marginal normality, it is still an arbitrary assumption, and may be undesirable in certain cases, for example, if continuous measurements demonstrate a high degree of skewness. Our simulations therefore explore cases where this assumption (along with others) is correct, and cases where it is violated.

In the Discussion section we will comment on potentials (pending investigation and development) of models that are more flexible about the distribution of $X$ . For the current paper, we adopt the conditional normality assumption, because methods based on this assumption are well developed, making it easy to include conditional dependence for some measurement items if needed (see the measurement model below), and to handle multiple latent variables. It is immportant to note, though, that because $X$ is a latent variable, some assumption is needed about its distribution.666This setting is different from the usual setting in the measurement error literature, where $X$ is unobserved but is not a latent variable, so it may be observed elsewhere. In the latter case, if a validation dataset is available where $X$ is observed, the relationship between $X$ and a subset of the other variables (e.g., the measurement model) can be estimated in the validation sample, which may help the analysis of interest, without requiring an assumption on the distribution of $X$ . For example, the conditional score method [Carroll2006] allows fitting a generalized linear model with cannonical link when a covariate is measured with error, assuming a normal error model that is known (i.e., estimated from validation data); this method can be applied to estimate a logit exposure assignment model without assuming a distribution for $X$ [McCaffrey2013]. In the current setting, on the other hand, $X$ is latent, and the measurement model and the exposure assignment model are estimated on the same data. Without imposing some structure on the distribution of $X$ , this joint model is unidentified, and the measurement model alone is unidentified. A distributional assumption is the price we pay to make progress in this setting.

The exposure assignment model. Our implementation of the method assumes a logit or probit exposure assignment model, i.e., $\text{P}(A=1\mid Z,X)=\text{expit}(\beta_{0}+\beta_{x}X+\beta_{z}Z)$ , or $\Phi(\beta_{0}+\beta_{x}X+\beta_{z}Z)$ . This model assumes no interaction between the latent variable $X$ and the observed variables $Z$ on exposure assignment. If in a specific application it is believed that such an interaction plays an important role, then the current method would work less well, as it does not target covariate balance for the interaction terms.

The measurement model. In the social and behavioral sciences, the measurements of a latent continuous variable may take a range of forms – continuous, ordinal, count, etc. We assume that the models for $\bm{W}$ items given $X,Z$ are normal-linear (for continuous items) or generalized linear. In the simplest model with full conditional independence, the model for a continuous item is $W_{k}\mid X,Z\sim\textup{N}(\lambda_{k0}+\lambda_{kx}X,\sigma_{k}^{2})$ ; that for an ordinal item with R categories is $\text{P}(W_{k}>r)=\pi(-\tau_{kr}+\lambda_{kx}X)$ where $\tau_{kr}$ is the $r$ th (of $\textup{R}\!-\!1$ ) thresholds, and $\pi(\cdot)$ is the standard normal CDF (or the expit function) if the model uses the probit (or logit) link; and the model for a count item is $\text{E}[W_{k}\mid X,Z]=\exp(\lambda_{k0}+\lambda_{kx}X)$ , etc. (The coefficient $\lambda_{kx}$ in these models is also referred to as a factor loading, the loading of $W_{k}$ on $X$ .) Regarding the model for continuous measurements, as previously mentioned, the normal error assumption is commonly made in latent variable modeling; it is also commonly made in measurement error methods [Stefanski1985, Stefanski1987, e.g.,].

To accommodate conditional dependence of an item $W_{k}$ on $Z$ , the model includes $Z$ as a predictor, so the linear predictor part of the model minus the intercept, here denoted $\theta_{k}$ , is $\lambda_{kx}X+\lambda_{kz}Z$ instead of simply $\lambda_{kx}X$ . In Fig. 3, possible conditional dependence between $\bm{W}$ items and $Z$ is represented by the dashed arrow from $Z$ to $\bm{W}$ . (Again, this is a convenient modeling choice and does not represent an actual causal assumption.)

To accommodate conditional dependence between a pair or among several $\bm{W}$ items, we use a parameterization that attributes the source of this dependence to an unobserved common cause, represented by a nuisance latent variable independent of all other variables (see Fig. 3a). We denote this variable by $S$ (for shared variance), and assume it is standard normal. More than one such variable may be required, for example $S_{1}$ to account for the conditional dependence of $W_{1}$ and $W_{3}$ , and $S_{2}$ for the conditional dependence of $W_{4},W_{6}$ and $W_{7}$ . In this case, $S$ terms are added to the $\theta_{k}$ parts of the models for these items: $\lambda_{1s_{1}}S_{1}$ and $\lambda_{3s_{1}}S_{1}$ are respectively added to $\theta_{1}$ and $\theta_{3}$ , and $\lambda_{4s_{2}}S_{2}$ , $\lambda_{6s_{2}}S_{2}$ and $\lambda_{7s_{2}}S_{2}$ added to $\theta_{4},\theta_{6}$ and $\theta_{7}$ . (A technical detail: when only two items load on an $S$ variable, the two factor loadings are constrained to be equal, e.g., $\lambda_{1s_{1}}=\lambda_{3s_{1}}$ , to pare them down to one parameter, which is appropriate as the pair represents only one dependence.) This part of the model in its most generality can be written concisely in vector/matrix form as

[TABLE]

where $\bm{\theta}$ is a vector of dimension K with each element corresponding to the model for one $\bm{W}$ item; $\bm{\lambda}_{x}$ and $\bm{\lambda}_{z}$ also have dimension K; $\bm{S}$ is a vector of L nuisance factors, and $\bm{\Lambda}_{s}$ is a matrix of dimension K $\times$ L containing the loadings of the K measurement items on these factors. In most applications, L is small if not zero, and most elements of $\bm{\Lambda}_{s}$ are zero. Likewise, most if not all elements of $\bm{\lambda}_{z}$ are zero. This parameterization of conditional dependence among measurements is equivalent to the error covariance parameterization (see Fig. 3b) usually used for continuous measurements – the product of $\lambda_{1s_{1}}$ and $\lambda_{3s_{1}}$ above is the covariance of $W_{1}$ and $W_{3}$ given $X$ . The nuisance $S$ parameterization, however, applies more generally to different types, and mixed types, of measurement items.

In summary, the SEM used to estimate $X_{WZA}$ (via the iFS) includes three components: (1) a linear model for $X$ given $Z$ with normal error; (2) a probit/logit model for $A$ given $Z,X$ ; and (3) a linear normal model, or a generalized linear model, for $\bm{W}$ conditional on $X,Z$ , where the $\bm{W}$ items are for the most part independent of one another given $X,Z$ , and for the most part independent of $Z$ given $X$ . In contrast, the cFS is based on a measurement model with $\bm{W}$ only that assumes $X$ is marginally normally distributed.

4.3 Model fitting and FS computation

We fit models in Mplus using maximum likelihood (ML) estimation. The iFS is then computed using the posterior mean (EAP) method [Bock1981], which is the estimated $\text{E}[X|\bm{W},Z,A]$ based on the model. Since the model may use a logit or probit link for $A$ , there are two such iFS versions, which we refer to as the logit and probit iFS.

We also consider two approximate iFS versions. Mplus can also fit the probit model via weighted least squares, but then the iFS is computed using the posterior mode (MAP) method [Muthen2004]. We refer to this iFS version as probit-WLS, and note that it is only an approximate estimate of $\text{E}[X|\bm{W},Z,A]$ . Our motivations for considering weighted least squares are practical: (1) it is computationally light, which is helpful when dealing with multiple latent variables; (2) it is Mplus’s default for categorical response variables if neither estimator nor link function is specified, which may be picked up in practice as a result of users’ habits. The probit-WLS iFS performs almost identically to the probit iFS in all the simulations, so we will not discuss it for the rest of the paper.

The second approximate iFS version does not require Mplus. It can be implemented with the bare minimum: software that fits linear factor models with some correlated errors and computes factor scores. This iFS is generated from a linear factor model (not a SEM) that treats $\bm{W},Z,A$ all as indicators of the latent variable $X$ (ignoring the causal structure and ignoring the fact that not all these variables are continuous); replaces the effect of $Z$ on $A$ and any $Z$ - $\bm{W}$ direct effects with error covariances; and adds error covariances between $Z$ variables. This factor model is distributionally equivalent to the SEM in Fig. 3b if that SEM uses linear models for all variables. We label this the linear iFS.

5 Simulation Results on the Performance of the iFS Compared to Existing Proxies for $X$ When Models Are Correctly Specified

This section reports on simulations that confirm the theoretical result that $X_{WZA}$ is a better proxy for $X$ than the non-inclusive proxies (all $\bm{W}$ items, summary score, and $X_{W}$ ), and explore this proxy’s relative performance in terms of balance on several moments of $X$ . To zoom in on the comparison with the other proxies, we match our data generating model and estimation model so that the iFS estimates $X_{WZA}$ well and the cFS estimates $X_{W}$ well. A later section considers situations where the estimation model is misspecified.

We examine the proxies’ performance in PS analysis in terms of balance obtained on $Z$ and $X$ , and in terms of bias, variance and mean square error (MSE) in ACE estimation. In this simulation study, we use PS weighting with inverse probability weights. That is, if $X_{\text{proxy}}$ is the proxy of $X$ , then the estimated PS is $\hat{e}(Z,X_{\text{proxy}})=\hat{\text{P}}(A=1\mid Z,X_{\text{proxy}})$ , the corresponding weight is $\hat{Q}=A/\hat{e}(Z,X_{\text{proxy}})+(1-A)/[1-\hat{e}(Z,X_{\text{proxy}})]$ , and the ACE is estimated by the difference between weighted mean outcomes,

[TABLE]

5.1 Covariate balance

So that the cFS model can be correctly specified, in the data generating model for this simulation, $Z$ and $X$ are multivariate normal, and $\bm{W}$ are independent of $Z$ given $X$ . We generate (i) multivariate normal $Z,X$ with mean 0, variance 1, covariance 0, 0.4 or -0.4; (ii) two to ten continuous $\bm{W}$ items that are linear combinations of $X$ and noise such that their correlations with $X$ are 0.4, 0.6 or 0.8; and (iii) binary $A$ from the logit $\text{P}(A\!=\!1|Z,X)=\text{expit}(b_{0}+b_{1}Z+b_{2}X)$ or probit $\text{P}(A\!=\!1|Z,X)=\Phi(c_{0}+c_{1}Z+c_{2}X)$ model, where $b_{1},b_{2}$ are 0.5 or 1, $c_{1},c_{2}$ are 0.294 or 0.588, and $b_{0},c_{0}$ are calibrated so that exposure prevalence ranges from 0.5 to 0.2. For each scenario, we simulate 5000 datasets of size 1000.

With each simulated dataset, we compute the summary score as $\overline{W}$ (the mean of $\bm{W}$ items) and the iFS based on the correctly specified model (logit or probit depending on the true exposure assignment mechanism). With three or more $\bm{W}$ items, we compute the cFS. With each of these proxies, we estimate PSs via a model with the correct link function (logit or probit based on the true model),777Strictly speaking, with the logit exposure assignment case, the iFS model is correctly specified, but the model form of the PS model conditional on a proxy for $X$ is only approximately correct, because logit models are not collapsible. This approximation should be close, however, as the departure of the proxy $X_{WZA}$ from $X$ is uncorrelated with both the dependent variable and the predictors in the PS model. and compute inverse probability weights. Other than FS estimation in Mplus, all computing is done in R [R2018]. We use the R package MplusAutomation [MplusAutomation2018] to bridge between R and Mplus.

Results regarding covariate balance are consistent across scenarios. We show one set of scenarios in Fig. 4 ( $Z$ - $X$ correlation 0.4, continuous $\bm{W}$ , $\bm{W}$ - $X$ correlations alternating between 0.4 and 0.6, exposure prevalence 0.3, $Z$ and $X$ having equal influence on exposure assignment with $b_{1}=b_{2}=0.5$ , $c_{1}=c_{2}=0.294$ ).

Overall, all methods do well in achieving balance on $Z$ . With respect to balance on $X$ , the non-inclusive proxies do not perform well, and the iFS outperforms all of them. Essentially, the iFS obtains balance on not just the mean of $X$ but also the next four moments. This is a better finding than was anticipated based on the theoretical result.

The imbalance in the even moments of $X$ when using the non-inclusive proxies is much less noticeable than the imbalance in the odd moments. This is due to the symmetry of the distributions of $X$ and $Z$ . In fact, in scenarios where exposure prevalance is .5 or .4 (i.e., the exposure assignment model is symmetric or close to symmetric), we cannot visually detect imbalance in the even moments for any of the methods from the plot. In scenarios where exposure prevalence is smaller (e.g., .2), the imbalance in the even moments when using the non-inclusive proxies is more pronounced.

Among the non-inclusive proxies, $\overline{W}$ performs worse than all $\bm{W}$ items and worse than the cFS because here the $\bm{W}$ - $X$ correlations are non-uniform; when $\bm{W}$ - $X$ correlations are uniform, the non-inclusive proxy curves sit on top of one another.

5.2 Bias reduction

For each of the scenarios above, we simulate three outcomes. The first two are continuous, based on the linear model $Y_{1}\mid Z,X,A\sim\text{N}(Z\!+\!X,\leavevmode\nobreak\ 4)$ and the nonlinear model $Y_{2}\mid Z,X,A\sim\text{N}(Z\!+\!X+\!.5X^{2}\!-\!.1X^{3},\leavevmode\nobreak\ 4)$ , where the ACE is zero. The third is binary, based on the logit model $\text{P}(Y_{3}=1\mid Z,X,A)=\text{expit}(A+Z+X)$ , where the ACE is $\text{E}[\text{expit}(A+Z+X)]-\text{E}[\text{expit}(Z+X)]$ , a risk difference.

Fig. 5 presents bias in the estimated ACE. For all three outcomes bias is reduced when using the iFS as proxy for $X$ . And bias seems to be reduced to zero regardless of whether the outcome is linear in $X$ . This is consistent with the finding above that the iFS seems to obtain balance on several moments of $X$ and not just on the mean.

In summary, in these cases where the iFS model and the PS model are correctly specified, the iFS seems to obtain more than simply mean-balance on $X$ and bias removal in estimated ACE on outcomes linear in $X$ . Simulation results suggest that the iFS achieves what looks more like distribution balance on $X$ and also bias removal in estimated ACE on outcomes nonlinear in $X$ . This was not anticipated based on the theoretical result in the previous section. It is unclear whether this is only a feature of the special data generating mechanism, or whether more can be said about $X_{WZA}$ generally. We do not see a clear way to investigate this further through simulation using our current methods, because such investigation would require estimating $X_{WZA}$ well under a different data generating mechanism, which is a challenge given estimation options.

5.3 Mean square error reduction

Fig. 6 shows root MSE (solid curves) and standard deviation (dashed curves) of the estimator when using the different proxies. For the iFS-based estimator, MSE reflects variance, as the red dashed curves sit on top of the red solid curves. For estimators based on non-inclusive proxies, MSE is a combination of variance and non-zero bias. While the variance of the iFS-based estimator is larger, its MSE is smaller as a result of bias removal.

6 Connection to Known Results about Weighting and Matching Functions

The better than expected performance of the iFS begs the question whether, given these data generating models, $X_{WZA}$ as proxy for $X$ results in a weighting/matching function that leads to unbiased ACE estimation. This section relates this proxy to results about weighting and matching functions [McCaffrey2013, Lockwood2016].

Consider the use of weighting to adjust for $Z,X$ in estimating the ACE. The correct weighting function for this purpose is the inverse probability weight888The weighting function for the ACEE is $Q_{0}=A+(1\!-\!A)e(Z,X)/[1\!-\!e(Z,X)]$ . based on $(Z,X)$ ,

[TABLE]

With $X$ unobserved, $Q_{0}$ is unavailable. We consider weighting functions that are functions of observed variables $(\bm{W},Z,A)$ . [McCaffrey2013] show that a weighting function results in unbiased ACE estimation if and only if it is unbiased for the correct weighting function. Assume such a function, denoted $Q_{1}$ , exists. This means $\text{E}[Q_{1}\mid Z,X,A]=Q_{0}$ . Now consider matching instead of weighting. Correct matching functions include $H_{0}=(Z,X)$ , or $e(Z,X)$ , or any one-to-one function of $(Z,X)$ or $e(Z,X)$ , none of which are available, thus we consider matching functions that are functions of observed variables. [Lockwood2016] show that a matching function results in unbiased ACE estimation if and only if the weighting function based on it is unbiased for the correct weighting function. Assume that such a matching function exists. Denote it by $H_{1}$ , and the weighting function based on it ( $A[e(H_{1})]^{-1}+(1\!-\!A)[1\!-\!e(H_{1})]^{-1}$ ) by $Q_{H_{1}}$ . This means $\text{E}[Q_{H_{1}}\mid Z,X,A]=Q_{0}$ . Weighting with a $Q_{1}$ , or matching on a $H_{1}$ (if these exist and can be estimated), balances the distribution of $(Z,X)$ , and obtains unbiased ACE estimation.

Our proxy variable method implies treating $H=(Z,X_{WZA})$ , or $H=e(Z,X_{WZA})$ , as a matching function, and treating $Q=A[e(Z,X_{WZA})]^{-1}+(1\!-\!A)[1\!-\!e(Z,X_{WZA})]^{-1}$ as a weighting function. Our theoretical result indicates that weighting with $Q$ and matching on $H$ obtains balance in the first moment of $X$ . In the scenarios considered in the simulations above, weighting with $Q$ also seems to balance higher moments of $X$ . Using simulation, we now examine more closely how $Q$ relates to $Q_{0}$ in these scenarios, and also how $Q$ and $H$ compare to $Q_{1}$ and $H_{1}$ in the scenarios with logit exposure assignment – a special case where there are closed forms for $Q_{1}$ and $H_{1}$ . We will also relate $H$ to $H_{1}$ in another case.

6.1 Logit exposure assignment and normal measurement error

We start with this special case. In this case, [McCaffrey2013] give a closed form for $Q_{1}$ for the setting with a single measurement $W$ . We derive (see the Supplementary Material) an extenssion for the setting with multiple $\bm{W}$ items. With logit exposure assignment $\text{P}(A=1\mid Z,X)=\text{expit}(\beta_{0}+\beta_{z}Z+\beta_{x}X)$ , the correct weighting function is

[TABLE]

Given continuous measurements with normal errors $\bm{W}|X,Z\sim\text{N}(\bm{\lambda}_{0}+\bm{\lambda}_{x}X+\bm{\lambda}_{z}Z,\leavevmode\nobreak\ \bm{\Sigma})$ , the unbiased weighting function is

[TABLE]

where $X_{\text{MLE}}=(\bm{\lambda}_{x}^{\prime}\bm{\Sigma}^{-1}\!\bm{\lambda}_{x})^{-1}\bm{\lambda}_{x}^{\prime}\bm{\Sigma}^{-1}(\bm{W}-\bm{\lambda}_{0}-\bm{\lambda}_{z}Z)$ is the MLE of $X$ based on the measurement model (treating each individual’s $X$ value as an unknown parameter and treating model parameters as known) and $\text{var}(X\!-\!X_{\text{MLE}}|X,Z,A)=(\bm{\lambda}_{x}^{\prime}\bm{\Sigma}^{-1}\bm{\lambda}_{x})^{-1}$ . This function is obtained by replacing $X$ in $Q_{0}$ with the $X_{\text{MLE}}$ , but rescaling the exponential by an appropriate factor so that $\text{E}[Q_{1}\mid Z,X,A]=Q_{0}$ . This can be abbreviated to

[TABLE]

where $X^{*}=X_{\text{MLE}}+(2A-1)\beta_{x}(\bm{\lambda}_{x}^{\prime}\bm{\Sigma}^{-1}\bm{\lambda}_{x})^{-1}/2$ , the MLE of $X$ shifted up or down a distance depending on exposure status.

We use simulation to compare $Q$ to $Q_{1}$ in scenarios above that involve logit exposure assignment. With each scenario, we take one dataset of $Z,X,A$ of size 1000, and estimate for each individual $i$ their correct weight $Q_{0,i}$ via logistic regression. Conditional on the individuals’ $X$ values, we generate 10,000 datasets of $\bm{W}$ . Denote the $\bm{W}$ datasets by $j$ , $j=1,\dots,10000$ . With each dataset $j$ , for each individual $i$ , we estimate three weights: (i) the weight $Q_{ij}$ based on the iFS-based weighting function; (ii) a naive weight $Q_{W,ij}$ using $\overline{W}_{ij}$ as proxy for $X$ ; and (iii) the weight $Q_{1,ij}$ based on the unbiased weighting function formula (using parameter estimates from the SEM that combines the measurement and exposure assignment models). Combining the 10,000 $\bm{W}$ datasets, we compute, for each individual $i$ , the bias (i.e., average departure from $Q_{0,i}$ ) and variance of each weight type.

The top half of Fig. 7 shows bias of the weighting functions, using one scenario999This scenario, which belongs in the set of scenarios represented in the top panel of the last three figures, involves three measurement items whose correlations with $X$ are 0.4, 0.6 and 0.4. as example; the pattern is similar across scenarios. The naive weighting function $Q_{W}$ (shown in gray) is biased for the correct weights. $Q_{1}$ (black) is unbiased as indicated by the theoretical result. Our $Q$ (red) appears unbiased for the vast majority of the units; there are only a few noticeable deviations of bias from zero for units with large correct weights.

Notably, $Q$ mimics $Q_{1}$ extremely well. In the top panel, the black curve (plotted last) almost completely covers the red curve; only a tiny bit of the end of the red curve shows.101010This might be partly a precision issue, because the iFS come from Mplus with three decimal places precision, whereas the estimated model parameters in the $Q_{1}$ formula have six decimal places precision. Also, for each unit, the variance of the $Q$ weight and that of the $Q_{1}$ weight are almost identical. More interestingly, looking in specific simulated datasets, the $Q_{1}$ and $Q$ values are almost the same for most units, with visible differences only for units with the largest correct weights. Yet $Q$ and $Q_{1}$ are mathematically different functions. In $Q_{1}$ , $\beta_{0},\beta_{x},\beta_{z}$ are coefficients of the exposure assignment model, and $X^{*}$ is a linear combination of $\bm{W},Z,A$ . In $Q=1+\exp[(1-2A)(\delta_{0}+\delta_{1}X_{WZA}+\delta_{2}Z)]$ , $\delta_{0},\delta_{1},\delta_{2}$ are from the logit regression model regressing $A$ on $X_{WZA}$ and $Z$ , and $X_{WZA}$ is a nonlinear function of $\bm{W},Z,A$ .

Our conclusion for this special case is that $Q$ is numerically very close to $Q_{1}$ . Based on a result in [Carroll2006], $Q_{1}$ is an inverse probability weighting function based on $e(Z,X^{*})=\text{P}(A=1\mid Z,X^{*})$ , therefore $(Z,X^{*})$ and $e(Z,X^{*})$ are $H_{1}$ functions. The numerical closeness between $Q$ and $Q_{1}$ implies that in this special case our $H=e(Z,X_{WZA})$ is numerically very close to $H_{1}=e(Z,X^{*})=\textup{expit}(\beta_{0}+\beta_{x}X^{*}+\beta_{z}Z)$ .

6.2 Probit exposure assignment scenarios

With probit exposure assignment, there is no closed form for $Q_{1}$ , but we can relate $Q$ to $Q_{0}$ . We conduct the same simulation of $\bm{W}$ datasets as above (minus computation of $Q_{1}$ weights) for scenarios from the previous section that involve probit exposure assignment. Results are in the bottom half of Fig. 7. As with the logit case, here the naive $Q_{W}$ is biased for $Q_{0}$ , whereas $Q$ is unbiased for $Q_{0}$ for the vast majority of the units, with only noticeable non-zero bias for units whose correct weights $Q_{0}$ are large.

6.3 Relating $H=(Z,X_{WZA})$ to one very special case of $H_{1}$

There is another case with a closed form $H_{1}$ [Lockwood2016]. This case assumes that given $(X,Z)$ , $\bm{W}$ is multivariate normal with constant covariance matrix, and that $X$ is normal given $(Z,A)$ with conditional variance not depending on $A$ . Due to properties of the multivariate normal distribution, in this case $H_{1}$ exists and turns out to be the same as $H=(Z,X_{WZA})$ , and the corresponding $Q_{1}$ is $Q$ . While interesting, this case is unrealistic because of the second assumption – that $X$ is normal given $(Z,A)$ . With $X$ (and $Z$ ) causing $A$ , it must be an extremely special setting that obtains $X$ following the normal (or any other specific) distribution given $A$ (and $Z$ ). We do not make this assumption. (But note a related comment about the linear iFS in the next section.)

6.4 Several observations (and speculations) based on these connections

The weighting and matching functions implied by the proposed proxy method ( $Q$ and $H$ ) are generally not the exact weighting and matching functions for unbiased ACE estimation ( $Q_{1}$ and $H_{1}$ ). Let us refer to $Q$ and $H$ as approximately unbiased weighting and matching functions. (Here “unbiased” references unbiased estimation of the causal effect, so an unbiased weighting function is unbiased for the correct weighting function $Q_{0}$ , but this relationship does not hold for matching functions.111111[McCaffrey2013] and [Lockwood2016] label $Q_{1}$ and $H_{1}$ valid weighting and matching functions. We opt for “unbiased” instead of “valid,” because it seems less cognitively taxing to consider something both approximately unbiased and biased, than both approximately valid and invalid.) $Q$ and $H$ are approximate in the sense that they target balance in the first moment of $X$ ,121212While simulation results show nice balance on the first five moments, based on the theoretical results, we only claim that $Q$ and $H$ target the first moment. while $Q_{1}$ and $H_{1}$ are exact as they target full distributional balance (and as a result achieve unbiased effect estimation). We could think of $Q$ and $H$ as belonging in a class of approximately unbiased weighting and matching functions, in which there might be functions that are less approximate because they target balance in, say, the first two moments of $X$ .

Approximately unbiased weighting and matching functions, where available, are nice substitution for the ideal $Q_{1}$ and $H_{1}$ if these do not exist or if it is unknown whether they exist. Such approximate functions may also be relevant if $H_{1}$ and $Q_{1}$ exist but the methods for estimating them are approximate. In applications, all estimated weighting/matching functions are at best approximately unbiased in some sense; this is interesting to consider.

Whether there is pairing (or not) of a weighting function and a matching function is also interesting. Such pairing is nice to have if there is interest in estimators that, say, in the spirit of double robustness, adjust for covariates via both weighting (which requires a weighting function) and conditioning (which calls for a matching function). If a $H_{1}$ exists then there is a corresponding $Q_{1}$ [Lockwood2016, the PS weight based on $H_{1}$ –], but it is unclear whether the reverse is generally true. For example, it is not obvious what may be an $H_{1}$ that pairs with the $Q_{1}$ estimated by the procedure in section 3.1 of [McCaffrey2013]. This suggests that even if a $Q_{1}$ is available, a search for a matching function may still be necessary. In that case, an approximately unbiased matching function, if available, is relevant as a candidate matching function. Or the form of $Q_{1}$ (or the procedure for estimating $Q_{1}$ ) might provide clues for potential (exact or approximate) matching functions. For example, it is the form of $Q_{1}$ in the logit exposure assignment case above that reveals that $(Z,X^{*})$ and $e(Z,X^{*})$ are unbiased matching functions.

Our last observation is that in both the special cases with closed-form $H_{1}$ solutions above, the vector form $H_{1}$ includes $Z$ as one of the two elements, so the other element can be considered a proxy for $X$ . That means the solution in these two cases belongs in the intersection of the proxy variable approach and the matching function approach. [Lockwood2016] point out that a strategy for finding an unbiased weighting function $H_{1}$ is to find one that satisfies $A\perp\!\!\!\perp(Z,X)\mid H_{1}$ (although this is not a necessary condition). Both $H_{1}=(Z,X^{*})$ in the first special case and $H_{1}=(Z,X_{WZA})$ in the second special case satisfy this condition. That leads us to speculate that maybe in a substantial set of cases, if $H_{1}$ functions exist, there is a vector-valued form that contains $Z$ as an element. This points to a strategy for searching for $H_{1}$ : searching for $X_{\textup{proxy}}$ such that $A\perp\!\!\!\perp X\mid(Z,X_{\textup{proxy}})$ . If such a proxy exists, then $(Z,X_{\textup{proxy}})$ and $e(Z,X_{\textup{proxy}})$ are unbiased matching functions.

Before closing this section, let us tie a loose end. Our proposal of $X_{WZA}$ as proxy for $X$ is agnostic of which estimand (e.g., ACE or ACEE) is of interest. The implied matching function $H=(Z,X_{WZA})$ does not differentiate estimands, as estimation is simply based on balancing $H$ , or conditioning on and marginalizing over $H$ . But so far we have discussed PS weighting only for ACE estimation. The question is, if an ACE weighting function is (approximately) unbiased for the correct ACE weighting function, whether the related ACEE weighting function is (approximately) unbiased for the correct ACEE weighting function. The answer is yes, due to a simple connection between probability and odds. For each unit $i$ , the correct ACE weight, $\frac{1}{\text{P}(A=A_{i}\mid Z_{i},X_{i})}$ , is equal to 1 plus the odds of being in the other exposure condition given the unit’s covariate values, $\frac{\text{P}(A=1-A_{i}\mid Z_{i},X_{i})}{\text{P}(A=A_{i}\mid Z_{i},X_{i})}$ . That is, in the inverse probability weighted sample, each unit self-represents and in addition represents those in the other exposure condition that are similar to them. The self-representing component of the weight is always 1, and only the other-representing component varies. For an unexposed unit, the other-representing component, $\frac{\text{P}(A=1\mid Z_{i},X_{i})}{\text{P}(A=0\mid Z_{i},X_{i})}$ , is exactly the odds weight for ACEE estimation. Therefore (approximate) unbiasedness of an ACE weighting function implies (approximate) unbiasedness of the related ACEE weighting function.

7 Performance of the iFS Proxy When It Does Not Estimate $X_{WZA}$ Well

In previous sections, the iFS estimates $X_{WZA}$ well. We now examine the performance of the iFS when it does not estimate $X_{WZA}$ well. Simulation results for several such cases are collected in Fig. 8. In all these plots, the balance (represented by average differences in weighted sample moments) and bias corresponding to each proxy are recentered by substracting values obtained when replacing the proxy with the true $X$ in the PS model.

The top panel of Fig. 8 is the case where the iFS model is actually correctly specified, but since $\bm{W}$ items are ordinal with only four levels, they contain much less information about $X$ than if they were continuous. Here the iFS proxy results in residual imbalance on $X$ and bias in the ACE, but these are much reduced compared to when using non-inclusive proxies. A similar result (see the Supplemental Material) is observed in the presence of measurement errors’ dependence that is not incorporated in the iFS model: this results in bias, but the bias is noticeably less than the bias when using non-inclusive proxies.

The second panel shows that the linear iFS, although misspecified, performs well as a proxy for $X$ in PS analysis. In this simulation, for all representations of $X$ , the PS model uses the correct link (logit or probit). Note that this linear iFS model implies a multivariate normal structure that resembles the assumption that $X$ is normal given $(Z,A)$ in the second special case where $H_{1}=(Z,X_{WZA})$ in the previous section. This is interesting, but not surprising. Essentially using as approximation a suboptimal link function is equivalent to making a likely incorrect assumption. The nice result is that this approximation does not seem to matter much in these simulations.

The third panel shows the case where a wrong link function is assumed (and is used in both the FS model and the PS model), i.e., the logit link is used if the true exposure assignment mechanism is probit, and vice versa. In this case, even using the true covariate $X$ leads to some bias due to model misspecification. The plots show balance and bias for different proxies that are centered at what is achieved using the true $X$ with the wrong link function. The iFS performs very well relative to this appropriate benchmark.

The bottom panel represents the case where the true model of $X$ given $Z$ is skewed and the true model of $\bm{W}$ given $X$ is also skewed. The FS model thus misspecifies both of these elements. Again, the iFS outperforms the non-inclusive proxies in this setting.

8 The Real Data Example

We now illustrate bias correction using the iFS method with real data. The study question is whether out-of-school suspension (hereafter, suspension) in adolescence increases the risk of subsequent problems with the law. The Add Health sample is a representative sample of adolescents in the United States recruited during the 1994-95 school year (wave 1) when in grades 7-12 and followed into adulthood [Harris2013]. We use data from male participants in the public access dataset for whom data are available on analysis variables. The analysis is for illustrative purposes only; results should not be taken as substantive findings.

The exposure is suspension during the approximately one year between waves 1 and 2. The outcome is police arrest after wave 2, or more precisely, between wave 2 and the last wave with data (wave 4 in 2008). Analysis is restricted to male participants who at wave 1 had prior suspension history but no prior arrests. This restriction makes exposed and unexposed individuals more similar in their chance of receiving the exposure, thus more reasonable to compare. It excludes individuals who had little to no chance of being suspended. The goal is to estimate the effect of exposure on the exposed (ACEE).

In this sample (n = 417), 140 (33.6%) reported having the exposure. This is a small group (relative to the full original sample of several thousand), and we are not sure whether the subsetting of the data that led to this sample retains the representativeness for the corresponding subset of the national population. We thus choose to estimate the effect of the exposure on this specific group of exposed individuals, and ignore their survey weights from Add Health. Our analysis incorporates clustering (within schools) information to accommodate within-cluster correlation, using the R package survey [Lumley2004surveypackage, Lumley2019surveypackage].

Targeting the ACEE, we want to use PS weighting to weight the unexposed group to mimic the exposed group with respect to baseline covariates (measured at wave 1). These include observed covariates age, race, ethnicity, parent education, parent marital status; and latent covariates academic achievement (measured by four grades for math, English, social and natural sciences) and violence tendency (measured by four items reporting past 12-month physical fights including weapon use). The latent variables’ measurement items are ordinal: academic achievement items are coded 1=grade D/F, 2=C, 3=B and 4=A; violence items are on a 0=never to 3=five-or-more-times response scale [Harris2009].

Before further analysis, one task is required to bridge from the methodological investigation thus far to analysis in practice. As the focus of the current paper is covariate balancing through PS weighting using the iFS, we have taken for granted that the causal effect is estimated simply by taking the difference in PS weighted mean outcome between exposure conditions. All our simulation to this point uses this simple method, which we refer to as the weighting-only estimator. While this is fine for simulation studies – as we only look at method performance over many simulated datasets but not at any single dataset specifically – it might not work well for actual data analysis where we have only one sample and covariate balance obtained on the sample is often not exact, in which case we would worry about bias due to residual imbalance. To address this issue, we also use a second estimator, labeled weighting-plus, which combines PS weighting with regression adjustment. The latter is done by fitting, to the weighted sample, a working outcome model which we do not assume to be correct (here a logistic model regressing outcome on exposure and covariates), computing model-predicted potential outcome probabilities, and averaging them over the inference population. In this case of estimating the ACEE for a binary outcome, the weighting-only estimator is the difference between the outcome proportion in the exposed and the odds-weighted outcome proportion in the unexposed,

[TABLE]

Here $\hat{Q}_{i}$ , for an unexposed individual, is the odds of exposure assignment given the individual’s covariates, estimated based on the PS model. The weighting-plus estimator is equivalent to replacing the second term with the mean, taken over the exposed individuals, of their predicted potential outcome probabilities under non-exposure (denoted $\hat{Y}_{i}^{(0)}$ ) based on the working outcome model fitted to the odds weighted sample.

[TABLE]

The weighting-plus estimator is related to the standardized estimator, which has been shown to be consistent in the randomized trial setting (without measurement error), not assuming the working outcome model is correct [Rosenblum2010, Steingrimsson2017]. It is relevant to our current setting because the purpose of PS weighting is to mimic a randomized trial, and the residual imbalance we have in a PS analysis is analogous to the chance imbalance in a randomized trial. An additional simulation study for the current setting mimicking the sample data shows that the weighting-plus estimator performs well even when the working outcome model is misspecified (see the Supplemental Material). We now proceed with the data example.

Prior to PS analysis, we conduct factor analysis of the measurement items of the latent variables. Factor analysis supports unidimensionality for each set of items. The violence set has good internal consistency, ordinal alpha [Zumbo2007] = 0.81; the academic achievement set is less internally consistent, ordinal alpha = 0.67. We also conduct multi-group factor analysis to check for measurement non-invariance of the two latent variables between the exposed and unexposed groups, and conclude that measurement invariance is supported. Note that with temporal ordering (baseline covariates measured prior to the exposure), exposure status does not affect measurement. Measurement invariance could have been caused, however, by factors that precede measurement that influence both measurement and exposure assignment, and would have complicated the analysis. Fortunately measurement invariance is supported by the data.

The first three numeric columns in Table 1 summarize the baseline covariates in the exposed and unexposed groups. The standardized mean differences (SMD) for most covariates have absolute values above 0.1 indicating the group means differ by more than 0.1 standard deviation. Exposed participants were more likely to be African-American; their parents on average had lower education and were less likely to be married and more likely to be single parents. Also, the mean scores (i.e., the averages of measurement items) of the latent variables are distributed differently between the two groups: those in the exposed group on average had higher violence mean scores and lower academic achievement mean scores. The same pattern is observed with the iFSs for these two latent variables.

Just for illustration, we first conduct PS weighting using the observed covariates and the mean scores of the latent covariates, as if we did not know the iFS method. This results in improved covariate balance shown in the next two columns of Table 1: all but one of the variables put in the PS model have weighted SMD with absolute value below 0.1, indicating excellent balance [Stuart2010]. A side effect is improved balance on the two iFSs, which were not included in the PS model. However, the iFS for academic achievement still retains a large SMD, which we would not know without computing the iFSs.

With the weights based on the mean scores (ignoring measurement error), we would report either of the effect estimates in the “neither corrected” row in Table 2, and conclude that for male participants with prior history of suspension, additional suspension (between waves 1 and 2) increased the risk of subsequent arrests by police, by 11.0 percentage points (95% CI = (3.1, 18.3)) if using the weighting-only method, or by 11.6 percentage points (95% CI = (3.7, 18.5)) if using the weighting-plus method. The confidence intervals are equal-tail intervals obtained via the nonparametric bootstrap.

We now use the iFS method to correct bias due to measurement error. PS weighting based on the observed covariates and the iFSs of the latent covariates results in balance shown in the last two columns of the Table 1. Of the covariates entered in the PS model, only one has a weighted SMD with absolute value above 0.1, and only slightly so.

Note that balance in the mean score does not generally imply balance in the iFS, or vice versa. This is due to the fact that individuals with the same mean score value but different exposure status tend to have different true value on the latent covariate, and the iFS reflects this difference because it incorporates information about exposure status. This means when the iFS is balanced – which we aim to achieve – the mean score generally is not; the difference in mean score here depends on measurement reliability and on the strength of the latent variable’s association with exposure assignment.

Using the iFSs as proxies for the latent covariates, we arrive at the result in the “both corrected” row in Table 2. The average causal effect of additional suspension on the exposed is estimated to be an increase of the risk of subsequent police arrest of 7.5 percentage points (95% CI = (-0.8, 15.9)) if using the weighting-only estimator, or 8.6 percentage points (95% CI = (1.7, 18.2)) if using the weighting-plus estimator. These final effect estimates are smaller than those that do not benefit from the iFS method.

To fully illustrate how measurement error bias correction changes the estimated causal effect, two additional rows in Table 2 show results from analyses that uses the iFS method to correct measurement error for only one of the two latent variables (using the iFS for one but the mean score for the other one). Compared to no correction, measurement error correction results in reduction of the estimated causal effect. Correction for academic achievement only (the latent covariate with less reliable measurement) results in greater reduction in the estimated causal effect than correction for violence. Correction for both latent variables combined results in the largest reduction in the estimated causal effect.

9 Discussion

Our goal was to find a proxy for a latent covariate $X$ that would help reduce measurement error bias in PS analysis. The proxy we propose is the posterior mean $X_{WZA}$ of $X$ (given measurements, observed covariates and assigned exposure), estimated by the iFS via SEM. This proxy targets balance on the first moment of $X$ , an improvement over non-inclusive proxies that are informed only by the measurements. In simulation, this proxy substantially improves covariate balance and reduces bias, both when model assumptions are correct and when they are violated. This is an important result given that latent variables are commonly encountered. In addition, the theoretical results for the proxy $X_{WZA}$ apply in the general measurement error setting, and are not specific to latent variables.

An interesting connection exists between our proxy variable method and the weighting and matching functions approach. [McCaffrey2013] and [Lockwood2016] define and search for valid weighting and matching functions, which we refer to here as unbiased weighting and matching functions; these target distributional balance of the latent covariate. The weighting and matching functions implied by our proxy variable, which targets balance on the first moment of the latent covariate, belongs in a class of approximately unbiased weighting and matching functions. It would be interesting to explore, both generally and in special cases, the possibilities of other functions on a spectrum between these approximately unbiased and the exactly unbiased functions.

Turning our gaze back to the $X_{WZA}$ proxy strategy, this study is a first step; there are potential extensions as well as gaps. One clear direction is extending the range of models that can be used that allow computing the iFS. A potential avenue is to seek ways to incorporate measurement models that are more flexible about the distribution of the latent variable, e.g., using a normal mixture or a skew- $t$ distribution [Asparouhov2015, Wall2012, Lin2015, McLachlan2007, which are now available for continuous measurement items – see]. Another possibility is to take the approach in [Rabe-Hesketh2003] of approximating the unknown distribution of the latent variable with a discrete distribution whose number of levels, values, and probabilities are estimated to maximize the likelihood; the same approach was used by [McCaffreyPoster2015]. With either approach, much needs to be done to develop tools for estimating the latent variable’s posterior mean, and to allow flexibility in handling multiple latent variables and measurement items of diverse types.

Another area for future work is variance estimation, which is complicated due to the combination of $X_{WZA}$ estimation, PS estimation, and also the variance of $X\!-\!X_{WZA}$ . Even the bootstrap, which we use in the data example, should be systematically investigated for the current setting. Thinking outside the box, a possibility to be investigated that may help correct for bias while capturing full variability is Bayesian analysis where the latent variable is considered a fully missing variable and samples from its posterior are used in PS analysis; this approach does not require FS computation and thus may be more flexible.

In conclusion, obtaining valid causal effect estimates in the presence of latent confounders requires the use of methods that account for measurement error in those variables. As shown here, an easily implementable solution, the iFS method, can help reduce bias and lead to improved causal inferences. We comment on the practical application of this method and provide detailed implementation instructions and code in the Supplementary Material.

Appendix A

Proof of Theorem.

(1)

Note that $\text{E}[X\!-\!X_{WZA}\mid\bm{W},Z,A]=0$ and $X_{WZA}$ is a function of $(\bm{W},Z,A)$ .

[TABLE] 2. (2)

[TABLE] 3. (3)

[TABLE] 4. (4)

[TABLE]

∎

Proof of Corollary 1.

(1)

We show that $\text{E}[AQX]=\text{E}[X]$ .

[TABLE] 2. (2)

[TABLE]

where the last equality is the PS’s balancing property [Rosenbaum1983a].

∎

Proof of Corollary 2.

With this specific outcome model, the ACE is $\beta_{a}+\text{E}[\beta_{za}(Z)]+\beta_{xa}\text{E}[X]$ . We need to show that the weighting and matching estimators in this Corollary are unbiased for this ACE. Denote $Y^{(a)}\!-\!\text{E}[Y^{(a)}|Z,X]$ by $\epsilon^{(a)}$ . The outcome model assumption implies $\text{E}[\epsilon^{(a)}|Z,X]=0.$ Because $Y^{(a)}$ is a function of $(Z,X,\epsilon^{(a)})$ , it follows from the unconfoundedness assumption ( $A\perp\!\!\!\perp Y^{(a)}\mid Z,X$ ) that $A\perp\!\!\!\perp\epsilon^{(a)}\mid Z,X$ , which implies

[TABLE]

We assume $0<e(Z,X_{WZA})<1$ .

(1)

Let’s first prove the result for weighting using the weight function $Q$ .

Since $Q$ is a function of $(\bm{W},Z,A)$ , it followes from the weak surrogacy assumption ( $\bm{W}\perp\!\!\!\perp Y^{(a)}\mid Z,X,A$ ) that $Q\perp\!\!\!\perp Y^{(a)}\mid Z,X,A$ . Given the outcome model assumption, $Y^{(a)}$ is a function of $(Z,X,\epsilon^{(a)})$ , it follows that

[TABLE]

Now consider the weighted mean outcome in each exposure arm:

[TABLE]

Hence weighting by $Q$ results in unbiased ACE estimation in this case,

[TABLE] 2. (2)

Now we prove the result for matching on $e(Z,X_{WZA})$ . Consider one exposure arm:

[TABLE]

Hence the matching estimator is unbiased for the ACE in this case,

[TABLE]

∎

See PSLatentXSupplement_R3

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Propensity score analysis with latent covariates: Measurement error bias correction using the covariate’s posterior mean, aka the inclusive factor score

Abstract

keywords:

1 Introduction

1.1 The proxy variable approach: summary scores and factor scores

1.2 The weighting/matching function approach

1.3 Data example

2 Setting, Notation and Key Assumptions

3 Theoretical Support for XWZAX_{WZA}XWZA​ as Proxy for XXX for Covariate Balancing

Theorem**.**

Corollary 1**.**

Corollary 2**.**

4 Identification and Estimation of XWZAX_{WZA}XWZA​

4.1 Sufficient conditional independence assumption

4.2 Functional form and distributional assumptions

4.3 Model fitting and FS computation

5 Simulation Results on the Performance of the iFS Compared to Existing Proxies for XXX When Models Are Correctly Specified

5.1 Covariate balance

5.2 Bias reduction

5.3 Mean square error reduction

6 Connection to Known Results about Weighting and Matching Functions

6.1 Logit exposure assignment and normal measurement error

6.2 Probit exposure assignment scenarios

6.3 Relating H=(Z,XWZA)H=(Z,X_{WZA})H=(Z,XWZA​) to one very special case of H1H_{1}H1​

6.4 Several observations (and speculations) based on these connections

7 Performance of the iFS Proxy When It Does Not Estimate XWZAX_{WZA}XWZA​ Well

8 The Real Data Example

9 Discussion

Appendix A

Proof of Theorem.

Proof of Corollary 1.

Proof of Corollary 2.

3 Theoretical Support for $X_{WZA}$ as Proxy for $X$ for Covariate Balancing

Theorem.

Corollary 1.

Corollary 2.

4 Identification and Estimation of $X_{WZA}$

5 Simulation Results on the Performance of the iFS Compared to Existing Proxies for $X$ When Models Are Correctly Specified

6.3 Relating $H=(Z,X_{WZA})$ to one very special case of $H_{1}$

7 Performance of the iFS Proxy When It Does Not Estimate $X_{WZA}$ Well