Measurement error and precision medicine: error-prone tailoring   covariates in dynamic treatment regimes

Dylan Spicker; Michael Wallace

arXiv:1907.11659·stat.ME·August 5, 2020

Measurement error and precision medicine: error-prone tailoring covariates in dynamic treatment regimes

Dylan Spicker, Michael Wallace

PDF

1 Repo

TL;DR

This paper examines how measurement error in covariates affects the development of dynamic treatment regimes in precision medicine, demonstrating that correction techniques improve treatment decision accuracy.

Contribution

It introduces measurement error correction methods specifically for dynamic treatment regimes, addressing a gap in precision medicine research.

Findings

01

Measurement error correction improves treatment regime accuracy

02

Simulation and theoretical results support correction methods

03

Application to STAR*D study illustrates practical benefits

Abstract

Precision medicine incorporates patient-level covariates to tailor treatment decisions, seeking to improve outcomes. In longitudinal studies with time-varying covariates and sequential treatment decisions, precision medicine can be formalized with dynamic treatment regimes (DTRs): sequences of covariate-dependent treatment rules. To date, the precision medicine literature has not addressed a ubiquitous concern in health research - measurement error - where observed data deviate from the truth. We discuss the consequences of ignoring measurement error in the context of DTRs, focusing on challenges unique to precision medicine. We show - through simulation and theoretical results - that relatively simple measurement error correction techniques can lead to substantial improvements over uncorrected analyses, and apply these findings to the Sequenced Treatment Alternatives to Relieve…

Tables7

Table 1. Table 1: Results for the coverage probabilities derived from bootstrap procedures across three scenarios. Intervals formed using an n-out-of-n ( nn ) or an m-out-of-n bootstrap, based on the adaptive procedure with ζ 𝜁 \zeta ( mn ζ subscript mn 𝜁 \text{mn}_{\zeta} ). Each set was constructed using 2000 2000 2000 bootstrap replicates, and the experiment was repeated 500 500 500 times. Bolded values indicate those which deviate significantly from the nominal coverage of 0.95 0.95 0.95 . Intervals are shown for treatment at both stages ( A j subscript 𝐴 𝑗 A_{j} ), as well as treatment interactions with the error-prone covariates ( X j subscript 𝑋 𝑗 X_{j} ) and the error-free covariate Z 2 subscript 𝑍 2 Z_{2} . Only scenario 3 used the error-free covariate.

	Scenario One			Scenario Two			Scenario 3
	nn	${mn}_{.05}$	${mn}_{.10}$	nn	${mn}_{.05}$	${mn}_{.10}$	nn	${mn}_{.075}$	${mn}_{.05}$	${mn}_{.10}$
$A_{1}$	0.937	0.958	0.970	0.950	0.962	0.972	0.97	0.98	0.98	0.98
$A_{1} X_{1}$	0.964	0.970	0.979	0.950	0.958	0.962	0.98	0.99	0.99	0.99
$A_{2}$	0.961	0.970	0.979	0.950	0.964	0.980	0.97	0.99	0.99	1.00
$A_{2} X_{2}$	0.940	0.961	0.973	0.952	0.956	0.960	0.96	0.99	0.98	0.99
$A_{2} Z_{2}$	–	–	–	–	–	–	0.97	0.98	0.98	0.98
$A_{2} X_{2} Z_{2}$	–	–	–	–	–	–	0.97	0.99	0.98	0.99

Table 2. Table 2: Results for the two-stage blip coefficient estimates comparing an analysis employing the regression calibration correction to naive analysis using only the clinician or self-reported data. Confidence intervals are computed based on 2000 m-out-of-n bootstrap replicates, where m 𝑚 m was chosen based on the described adaptive procedure. Bolded values indicate treatment effects which are significant at a 95 % percent 95 95\% level. A j subscript 𝐴 𝑗 A_{j} refers to the treatment indicator ( 1 1 1 for those with an SSRI, 0 0 otherwise), P j subscript 𝑃 𝑗 P_{j} refers to patient preference to switch ( 1 1 1 with a preference to switch, 0 0 with a preference to augment), Q j subscript 𝑄 𝑗 Q_{j} refers to the starting QIDS score at stage j 𝑗 j , and S j subscript 𝑆 𝑗 S_{j} the slope of the QIDS score over the j 𝑗 j -th phase.

	Error Corrected		Clinician Score		Self-Reported
Parameter	Estimate	$95 %$ CI	Estimate	$95 %$ CI	Estimate	$95 %$ CI
Stage One
$A_{1}$	-0.75	(-10.035, 7.934)	-0.48	(-6.279, 5.486)	1.35	(-3.784, 6.052)
$A_{1} P_{1}$	2.72	(-0.186, 5.819)	2.99	(0.898, 5.426)	2.76	(-0.199, 5.826)
$A_{1} Q_{1}$	0.06	(-0.569, 0.701)	0.07	(-0.346, 0.456)	-0.09	(-0.409, 0.253)
$A_{1} S_{1}$	-1.54	(-6.895, 2.246)	-1.04	(-3.768, 1.075)	-0.55	(-2.312, 0.887)
Stage Two
$A_{2}$	-0.31	(-7.045, 6.954)	1.19	(-2.88, 5.521)	-0.04	(-4.262, 4.078)
$A_{2} Q_{2}$	0.09	(-0.479, 0.632)	-0.02	(-0.371, 0.3)	0.08	(-0.219, 0.383)
$A_{2} S_{2}$	1.82	(-2.789, 4.833)	0.94	(-0.82, 2.696)	2.74	(0.297, 5.137)

Table 3. Table 3: Median parameter estimates investigating the impact of treatment probabilities in a multistage DTR, by varying ( α 10 , α 20 subscript 𝛼 10 subscript 𝛼 20 \alpha_{10},\alpha_{20} ) as indicated. Blip parameter estimates are compared for n = 10000 𝑛 10000 n=10000 individuals, using the corrected method compared to a naive analysis. The top set of rows of the table use the first error-prone proxy at both stages, the second set of rows use the mean of proxies at both stages, the third set of rows use the mean at the first stage and the first error-prone proxy at the second, and the final set of rows use the first error-prone proxy at the first stage and the mean at the second. Bold values indicate parameters for which the 95 % percent 95 95\% percentile-based interval across the 1000 1000 1000 simulation replicates did not cover the true parameter value.

	Regression Calibration				Naive
$(α_{10}, α_{20})$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$
(-2, -2)	1.0101	0.9974	1.0035	0.9963	1.0098	0.8874	1.0163	0.902
(-1, -1)	1.0103	0.9983	1.0038	0.9961	1.0102	0.8879	1.0304	0.9064
(0, 0)	1.0106	0.9961	1.0003	0.9995	1.0108	0.8859	1.045	0.9101
(1, 1)	1.0086	0.996	0.998	0.9994	1.0086	0.8857	1.0622	0.909
(2, 2)	1.0082	0.9983	0.9975	1.0027	1.0083	0.8881	1.0789	0.9077
(-2, 0)	1.01	0.9973	1.0009	0.9987	1.01	0.8872	1.0157	0.8993
(-1, 1)	1.0109	0.9977	0.999	0.9986	1.0109	0.8868	1.0267	0.9044
(0, 2)	1.011	0.9963	0.9963	0.9987	1.0105	0.8859	1.0413	0.9079
(1, -2)	1.0087	0.9971	1.0028	0.9993	1.0084	0.886	1.0704	0.904
(2, -1)	1.0084	0.9997	1.0036	0.9975	1.0079	0.8886	1.0891	0.8982
(-2, -2)	1.008	0.9992	1.0034	0.9965	1.0087	0.8887	1.0165	0.902
(-1, -1)	1.0112	0.9988	1.0048	0.998	1.0107	0.8877	1.031	0.9068
(0, 0)	1.0111	0.997	1.0013	0.9967	1.0111	0.8872	1.0454	0.9074
(1, 1)	1.0093	0.9963	0.9975	0.9985	1.009	0.8865	1.0621	0.9083
(2, 2)	1.0101	0.9993	0.9977	1.0007	1.0104	0.8884	1.0795	0.9054
(-2, 0)	1.0108	0.9994	1.0011	0.9985	1.0108	0.8884	1.0159	0.8991
(-1, 1)	1.011	0.9982	0.9979	0.9988	1.0113	0.8875	1.0257	0.9039
(0, 2)	1.0119	0.9972	0.9956	0.9972	1.0118	0.8865	1.041	0.9073
(1, -2)	1.0091	0.9973	1.0043	0.9976	1.009	0.8866	1.072	0.903
(2, -1)	1.01	1.0002	1.0014	0.998	1.0091	0.8894	1.0871	0.8974
(-2, -2)	1.0083	1.0001	1.0038	0.9975	1.0082	0.8895	1.0163	0.9018
(-1, -1)	1.0107	0.999	1.0039	0.9971	1.0106	0.8882	1.0305	0.9064
(0, 0)	1.0116	0.9968	1.0005	0.9982	1.0118	0.8866	1.0453	0.9097
(1, 1)	1.0096	0.9955	0.9981	0.9995	1.0095	0.8862	1.0619	0.909
(2, 2)	1.0101	0.9994	0.9968	1.0011	1.0098	0.8889	1.0802	0.9066
(-2, 0)	1.0107	0.9989	1.0008	0.9991	1.0113	0.8886	1.0156	0.8998
(-1, 1)	1.0113	0.9983	0.9989	0.9995	1.011	0.8874	1.0262	0.9055
(0, 2)	1.0116	0.9969	0.9967	0.9976	1.0115	0.8869	1.042	0.9081
(1, -2)	1.0093	0.9975	1.0043	0.999	1.0093	0.8869	1.0721	0.9041
(2, -1)	1.0098	0.9996	1.0024	0.9975	1.0097	0.8891	1.0887	0.8974
(-2, -2)	1.0096	0.9977	1.0034	0.9979	1.0095	0.8875	1.0162	0.9027
(-1, -1)	1.0094	0.9978	1.0038	0.9976	1.0097	0.8872	1.0304	0.9067
(0, 0)	1.0106	0.9963	1.0013	0.9974	1.0104	0.8859	1.0459	0.9085
(1, 1)	1.0085	0.9965	0.9982	1.0001	1.0084	0.8857	1.0621	0.9093
(2, 2)	1.0079	0.9983	0.9961	0.9983	1.0078	0.8879	1.0786	0.9043
(-2, 0)	1.0099	0.9983	1.0007	0.9977	1.01	0.8872	1.0155	0.8988
(-1, 1)	1.0112	0.9976	0.9983	0.9981	1.011	0.8866	1.0262	0.9043
(0, 2)	1.0108	0.9962	0.9959	0.9957	1.0108	0.8863	1.0411	0.906
(1, -2)	1.0078	0.9973	1.0034	0.998	1.0074	0.8864	1.071	0.9031
(2, -1)	1.0082	0.9996	1.0036	0.9969	1.0079	0.8887	1.0881	0.8972

Table 4. Table 4: Median parameter estimates investigating the impact of treatment thresholds in a multistage DTR, by varying ( ψ 11 , ψ 21 subscript 𝜓 11 subscript 𝜓 21 \psi_{11},\psi_{21} ) as indicated. Blip parameter estimates are compared for n = 10000 𝑛 10000 n=10000 individuals, using the corrected method compared to a naive analysis. The top set of rows of the table use the first error-prone proxy at both stages, the second set of rows use the mean of proxies at both stages, the third set of rows use the mean at the first stage and the first error-prone proxy at the second, and the final set of rows use the first error-prone proxy at the first stage and the mean at the second. Bold values indicate parameters for which the 95 % percent 95 95\% percentile-based interval across the 1000 1000 1000 simulation replicates did not cover the true parameter value.

	Regression Calibration				Naive
$(ψ_{10}, ψ_{11}, ψ_{20}, ψ_{21})$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$
(1, -1, 1, -1)	0.9902	-1.001	1.0011	-1.002	0.9907	-0.8904	0.956	-0.9114
(1, -0.1, 1, -0.1)	0.9999	-0.1017	1.0017	-0.1006	0.9998	-0.0904	0.9971	-0.0914
(1, 0, 1, 0)	1	-0.0019	1.0014	-8e-04	1	-0.0017	1.0016	-9e-04
(1, 0.1, 1, 0.1)	0.9998	0.0979	1.0016	0.099	0.9998	0.0872	1.006	0.0904
(1, 1, 1, 1)	1.0106	0.9961	1.0003	0.9995	1.0108	0.8859	1.045	0.9101
(1, -1, 1, 0)	1.0006	-1.0014	1.0017	-1e-04	1.0005	-0.89	1.0017	2e-04
(1, -0.1, 1, 0.1)	0.9998	-0.1018	1.0017	0.0994	0.9998	-0.0905	1.0059	0.0906
(1, 0, 1, 1)	1.0106	-0.0029	1.0017	0.9986	1.0105	-0.0026	1.0461	0.9094
(1, 0.1, 1, -1)	0.9918	0.0988	1.0019	-1.0028	0.9918	0.0877	0.9563	-0.9121
(1, 1, 1, -0.1)	0.9998	0.9967	1.0002	-0.1017	1.0004	0.8867	0.9958	-0.0922
(1, -1, 1, -1)	0.993	-1.0006	0.9997	-1.0023	0.9934	-0.8895	0.9548	-0.911
(1, -0.1, 1, -0.1)	1.0001	-0.1017	0.9993	-0.1018	1.0001	-0.0903	0.9951	-0.0924
(1, 0, 1, 0)	1.0006	-0.0017	0.9995	-0.0017	1.0006	-0.0015	0.9996	-0.0018
(1, 0.1, 1, 0.1)	1.0007	0.0985	0.9999	0.0976	1.0007	0.0875	1.0046	0.0889
(1, 1, 1, 1)	1.0111	0.997	1.0013	0.9967	1.0111	0.8872	1.0454	0.9074
(1, -1, 1, 0)	1.0025	-1.0015	0.9996	-0.0015	1.0025	-0.8903	0.9994	-0.0012
(1, -0.1, 1, 0.1)	1.0006	-0.1016	0.9993	0.0981	1.0006	-0.0903	1.004	0.0895
(1, 0, 1, 1)	1.0117	-0.0015	1.0016	0.9968	1.0117	-0.0013	1.0459	0.9076
(1, 0.1, 1, -1)	0.9924	0.0992	1.0005	-1.0034	0.9925	0.0881	0.9549	-0.9123
(1, 1, 1, -0.1)	0.9997	0.9969	1.0006	-0.1023	1.0001	0.8871	0.9957	-0.0933
(1, -1, 1, -1)	0.9926	-1.0004	1.0011	-1.0041	0.9927	-0.8893	0.9556	-0.913
(1, -0.1, 1, -0.1)	1.0015	-0.1013	1	-0.1008	1.0015	-0.09	0.9959	-0.092
(1, 0, 1, 0)	1.0015	-0.0017	1.0001	-7e-04	1.0015	-0.0015	1	-8e-04
(1, 0.1, 1, 0.1)	1.0018	0.0984	1.0002	0.0991	1.0017	0.0875	1.0046	0.0902
(1, 1, 1, 1)	1.0116	0.9968	1.0005	0.9982	1.0118	0.8866	1.0453	0.9097
(1, -1, 1, 0)	1.0023	-1.0016	1.0008	-0.0011	1.0022	-0.8901	1.0008	-0.0015
(1, -0.1, 1, 0.1)	1.0016	-0.1016	1.0006	0.0994	1.0016	-0.0903	1.0048	0.09
(1, 0, 1, 1)	1.0121	-0.0014	1.0004	0.9982	1.0121	-0.0012	1.045	0.9097
(1, 0.1, 1, -1)	0.9926	0.0996	0.9999	-1.0028	0.9927	0.0885	0.9549	-0.9121
(1, 1, 1, -0.1)	1.001	0.9971	0.9998	-0.1007	1.0009	0.8872	0.9952	-0.0919
(1, -1, 1, -1)	0.9893	-1.0008	1.0004	-1.0018	0.9893	-0.8897	0.9554	-0.9112
(1, -0.1, 1, -0.1)	0.9992	-0.1017	1.0013	-0.1014	0.9992	-0.0906	0.9964	-0.0928
(1, 0, 1, 0)	0.9994	-0.0019	1.0013	-0.0016	0.9993	-0.0017	1.001	-0.0018
(1, 0.1, 1, 0.1)	0.9997	0.0979	1.0012	0.0982	0.9996	0.0873	1.0058	0.0893
(1, 1, 1, 1)	1.0106	0.9963	1.0013	0.9974	1.0104	0.8859	1.0459	0.9085
(1, -1, 1, 0)	1.0008	-1.0013	0.9997	-6e-04	1.0009	-0.8901	0.9996	-8e-04
(1, -0.1, 1, 0.1)	0.9997	-0.1014	1.001	0.0987	0.9996	-0.0903	1.0056	0.0894
(1, 0, 1, 1)	1.0102	-0.0023	1.0011	0.9986	1.0102	-0.002	1.0456	0.9091
(1, 0.1, 1, -1)	0.9914	0.0986	1.0013	-1.0021	0.9914	0.0876	0.9563	-0.9119
(1, 1, 1, -0.1)	0.9994	0.9966	1.0005	-0.1016	0.9995	0.8867	0.9958	-0.0925

Table 5. Table 5: Median parameter estimates investigating the impact of treatment probabilities in a multistage DTR, by varying the true treatment-free model as indicated. Linear treatment-free models are used in all settings. Blip parameter estimates are compared for n = 10000 𝑛 10000 n=10000 individuals, using the corrected method compared to a naive analysis. The top set of rows of the table use the first error-prone proxy at both stages, the second set of rows use the mean of proxies at both stages, the third set of rows use the mean at the first stage and the first error-prone proxy at the second, and the final set of rows use the first error-prone proxy at the first stage and the mean at the second. Bold values indicate parameters for which the 95 % percent 95 95\% percentile-based interval across the 1000 1000 1000 simulation replicates did not cover the true parameter value.

	Regression Calibration				Naive
Treatment-Free Model	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$
Linear	1.0106	0.9961	1.0003	0.9995	1.0108	0.8859	1.045	0.9101
Quadratic	1.0105	1.0043	1.0012	0.9982	1.0104	0.893	1.0457	0.9098
Cubic	1.0019	1.0108	1.0041	0.9993	1.0021	0.8986	1.0484	0.9099
Exponential	1.0042	1.0105	1.0035	0.9986	1.0041	0.8976	1.0476	0.9104
Complex	1.0125	1.0038	0.9994	1.0007	1.0127	0.8921	1.044	0.9116
Linear	1.0111	0.997	1.0013	0.9967	1.0111	0.8872	1.0454	0.9074
Quadratic	1.0106	1.0017	1.0033	0.9959	1.0107	0.8901	1.0483	0.9078
Cubic	1.0069	1.0043	1.0072	1.0012	1.0068	0.8938	1.0525	0.9129
Exponential	1.0089	1.0066	1.0054	1.0015	1.0091	0.8939	1.0503	0.9122
Complex	1.0113	1.0014	1.0035	0.9969	1.0116	0.8903	1.0467	0.9082
Linear	1.0116	0.9968	1.0005	0.9982	1.0118	0.8866	1.0453	0.9097
Quadratic	1.0116	1.001	1.0005	0.9985	1.0115	0.8897	1.044	0.9093
Cubic	1.0075	1.0034	1.0038	0.9967	1.0075	0.8937	1.0484	0.9095
Exponential	1.0091	1.0069	1.0033	0.9996	1.0087	0.8949	1.0477	0.9107
Complex	1.0119	1.0026	1	1.0012	1.0116	0.8904	1.0445	0.9108
Linear	1.0106	0.9963	1.0013	0.9974	1.0104	0.8859	1.0459	0.9085
Quadratic	1.01	1.004	1.003	0.997	1.0104	0.8927	1.0472	0.9078
Cubic	1.0017	1.0101	1.0044	0.9992	1.0021	0.8974	1.0504	0.9104
Exponential	1.0046	1.0097	1.0061	0.9996	1.0045	0.8975	1.0503	0.9104
Complex	1.0115	1.0039	1.0015	0.9985	1.0115	0.8926	1.0457	0.9091

Table 6. Table 6: Median parameter estimates investigating the impact of treatment probabilities in a multistage DTR, by varying the treatment models as indicated. Linear treatment models are used in all situations. Blip parameter estimates are compared for n = 10000 𝑛 10000 n=10000 individuals, using the corrected method compared to a naive analysis. The top set of rows of the table use the first error-prone proxy at both stages, the second set of rows use the mean of proxies at both stages, the third set of rows use the mean at the first stage and the first error-prone proxy at the second, and the final set of rows use the first error-prone proxy at the first stage and the mean at the second. Bold values indicate parameters for which the 95 % percent 95 95\% percentile-based interval across the 1000 1000 1000 simulation replicates did not cover the true parameter value.

	Regression Calibration				Naive
Treatment Models	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$
Linear/Linear	1.0106	0.9961	1.0003	0.9995	1.0108	0.8859	1.045	0.9101
Linear/Quadratic	0.9992	0.9992	0.8902	1.1138	0.9991	0.8881	0.9407	1.014
Linear/Mixed	0.9925	1	0.9022	1.1187	0.9923	0.8893	0.9529	1.0174
Linear/Exponential	1.0126	0.9989	0.9836	1.016	1.0124	0.8876	1.0291	0.9257
Quadratic/Quadratic	0.9047	1.1117	0.8929	1.1153	0.9044	0.9885	0.9611	1.0159
Quadratic/Mixed	0.8969	1.1119	0.9011	1.1188	0.897	0.9886	0.9707	1.0199
Quadratic/Exponential	0.918	1.1125	0.9846	1.0189	0.918	0.9899	1.0478	0.9289
Mixed/Mixed	0.9033	1.1052	0.9078	1.1216	0.9033	0.9824	0.9946	1.0192
Mixed/Exponential	0.9246	1.1096	0.9853	1.0153	0.9244	0.9872	1.0652	0.9223
Exponential/Exponential	0.9979	1.013	0.9853	1.0131	0.9983	0.9007	1.0479	0.9244
Linear/Linear	1.0111	0.997	1.0013	0.9967	1.0111	0.8872	1.0454	0.9074
Linear/Quadratic	1.0013	0.9987	0.8847	1.1219	1.0012	0.8876	0.9353	1.0215
Linear/Mixed	0.9915	0.9979	0.8915	1.1179	0.9916	0.8868	0.9413	1.0179
Linear/Exponential	1.0128	0.999	0.9818	1.0194	1.013	0.8882	1.0271	0.9275
Quadratic/Quadratic	0.898	1.1245	0.8863	1.1239	0.8982	0.9994	0.9547	1.0234
Quadratic/Mixed	0.8889	1.1226	0.8943	1.1241	0.8887	0.9981	0.9629	1.0257
Quadratic/Exponential	0.9089	1.121	0.9809	1.021	0.9087	0.9965	1.0429	0.932
Mixed/Mixed	0.8941	1.1119	0.9053	1.1241	0.8938	0.9878	0.9923	1.0228
Mixed/Exponential	0.9142	1.1125	0.9834	1.0242	0.9141	0.9884	1.0622	0.9313
Exponential/Exponential	0.9945	1.0162	0.9831	1.0186	0.9948	0.9021	1.0474	0.9294
Linear/Linear	1.0116	0.9968	1.0005	0.9982	1.0118	0.8866	1.0453	0.9097
Linear/Quadratic	1.0012	0.9979	0.8904	1.1137	1.0008	0.8876	0.9412	1.0139
Linear/Mixed	0.9922	0.9985	0.8999	1.1146	0.9923	0.8881	0.9495	1.0143
Linear/Exponential	1.011	0.9982	0.9823	1.0161	1.0112	0.8869	1.0281	0.9263
Quadratic/Quadratic	0.8978	1.1239	0.8921	1.1151	0.8982	0.9981	0.9598	1.0149
Quadratic/Mixed	0.8904	1.1216	0.9021	1.1175	0.8903	0.9979	0.9703	1.0182
Quadratic/Exponential	0.9086	1.1215	0.9848	1.0112	0.9083	0.9972	1.0464	0.923
Mixed/Mixed	0.8932	1.1145	0.9077	1.1196	0.8937	0.9898	0.9943	1.0184
Mixed/Exponential	0.9133	1.1123	0.9854	1.0169	0.9126	0.9888	1.0637	0.9247
Exponential/Exponential	0.996	1.0178	0.9844	1.0135	0.9955	0.905	1.0486	0.9253
Linear/Linear	1.0106	0.9963	1.0013	0.9974	1.0104	0.8859	1.0459	0.9085
Linear/Quadratic	1.0003	0.9973	0.884	1.1229	0.9997	0.8868	0.9348	1.0231
Linear/Mixed	0.9908	0.998	0.8919	1.1214	0.9904	0.8869	0.9419	1.0213
Linear/Exponential	1.0111	0.9987	0.9803	1.0186	1.0113	0.888	1.0264	0.9292
Quadratic/Quadratic	0.9049	1.1118	0.8872	1.1245	0.9041	0.9884	0.9563	1.0241
Quadratic/Mixed	0.8951	1.1141	0.8978	1.1272	0.8956	0.9901	0.9677	1.0279
Quadratic/Exponential	0.9178	1.1119	0.9811	1.0225	0.9179	0.9884	1.0437	0.9333
Mixed/Mixed	0.9008	1.1039	0.9066	1.1307	0.901	0.9811	0.9946	1.0289
Mixed/Exponential	0.9248	1.1063	0.9843	1.023	0.9249	0.9835	1.0644	0.9311
Exponential/Exponential	0.9971	1.0136	0.984	1.0232	0.997	0.9008	1.0482	0.9345

Table 7. Table 7: Median parameter estimates investigating the impact of treatment probabilities in a multistage DTR, by varying the error-models as indicated. Blip parameter estimates are compared for n = 10000 𝑛 10000 n=10000 individuals, using the corrected method compared to a naive analysis. The top set of rows of the table use the first error-prone proxy at both stages, the second set of rows use the mean of proxies at both stages, the third set of rows use the mean at the first stage and the first error-prone proxy at the second, and the final set of rows use the first error-prone proxy at the first stage and the mean at the second. Bold values indicate parameters for which the 95 % percent 95 95\% percentile-based interval across the 1000 1000 1000 simulation replicates did not cover the true parameter value.

	Regression Calibration				Naive
Error Models	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$	$A_{1}$	$A_{1} X_{1}$	$A_{2}$	$A_{2} X_{2}$
Normal/Normal	1.0106	0.9961	1.0003	0.9995	1.0108	0.8859	1.045	0.9101
Normal/Approx. Normal	1.0175	0.9954	1.0007	0.9975	1.0174	0.8248	1.0705	0.8586
Normal/Gamma	1.0109	1.0292	1.0036	1.0078	1.0103	0.8577	1.0749	0.8647
Normal/Uniform	1.0062	1.034	1.0028	1.0226	1.0062	0.9729	1.0339	0.9616
Approx. Normal/Approx. Normal	1.0428	0.996	0.9989	0.9916	1.0414	0.6132	1.1612	0.6666
Approx. Normal/Gamma	1.0187	1.1453	1.0069	1.0592	1.019	0.7388	1.1893	0.7095
Approx. Normal/Uniform	1.0109	1.0405	1.0028	1.0318	1.0107	0.9659	1.0443	0.9495
Gamma/Gamma	1.0866	1.2838	0.919	1.202	1.086	0.857	1.1308	0.8097
Gamma/Uniform	1.0157	1.0608	0.9766	1.0494	1.0156	0.9842	1.0198	0.9638
Uniform/Uniform	1.0075	1.0367	0.991	1.043	1.0076	0.9957	1.0152	0.9957
Normal/Normal	1.0111	0.997	1.0013	0.9967	1.0111	0.8872	1.0454	0.9074
Normal/Approx. Normal	1.0173	0.9968	1.0002	0.9994	1.0165	0.8249	1.0706	0.8586
Normal/Gamma	1.0247	1.0323	0.9832	1.0201	1.0248	0.8602	1.0552	0.877
Normal/Uniform	1.0103	1.0358	0.995	1.0339	1.0098	0.9747	1.0269	0.9718
Approx. Normal/Approx. Normal	1.0393	1.008	0.9994	0.9984	1.0384	0.6209	1.163	0.6743
Approx. Normal/Gamma	1.0537	1.1959	0.9615	1.1174	1.0534	0.7688	1.1499	0.7544
Approx. Normal/Uniform	1.0104	1.0498	0.9957	1.0521	1.0101	0.9743	1.0383	0.9692
Gamma/Gamma	1.0951	1.4009	0.9042	1.3366	1.0947	0.9351	1.1268	0.952
Gamma/Uniform	1.0161	1.0605	0.9816	1.0662	1.0159	0.9859	1.025	0.9803
Uniform/Uniform	1.0079	1.033	0.9914	1.0418	1.0077	0.9923	1.0154	0.9942
Normal/Normal	1.0116	0.9968	1.0005	0.9982	1.0118	0.8866	1.0453	0.9097
Normal/Approx. Normal	1.0177	0.9966	1.0008	0.9977	1.0171	0.8257	1.0708	0.8578
Normal/Gamma	1.0263	1.0326	1.0028	1.0074	1.0262	0.86	1.0744	0.8649
Normal/Uniform	1.0112	1.036	1.0029	1.0235	1.0112	0.9746	1.0344	0.9619
Approx. Normal/Approx. Normal	1.0372	1.0098	0.9988	0.9909	1.037	0.6207	1.162	0.6673
Approx. Normal/Gamma	1.0495	1.1943	1.007	1.058	1.0493	0.7689	1.1893	0.7076
Approx. Normal/Uniform	1.0116	1.0498	1.0027	1.0313	1.0117	0.9742	1.0453	0.9487
Gamma/Gamma	1.072	1.4024	0.9193	1.2012	1.0719	0.9346	1.1294	0.809
Gamma/Uniform	1.0161	1.0601	0.9755	1.0494	1.016	0.9857	1.0197	0.9645
Uniform/Uniform	1.0078	1.0329	0.9916	1.0433	1.0078	0.9924	1.0156	0.9957
Normal/Normal	1.0106	0.9963	1.0013	0.9974	1.0104	0.8859	1.0459	0.9085
Normal/Approx. Normal	1.017	0.9955	1.0009	0.9985	1.0166	0.8249	1.0704	0.8581
Normal/Gamma	1.0095	1.0288	0.9834	1.0218	1.0089	0.8568	1.0569	0.8763
Normal/Uniform	1.0047	1.034	0.9952	1.0332	1.0048	0.9736	1.0266	0.9711
Approx. Normal/Approx. Normal	1.0441	0.997	0.9987	0.9996	1.0439	0.6129	1.1602	0.6746
Approx. Normal/Gamma	1.0232	1.1462	0.9612	1.1165	1.023	0.7378	1.1498	0.7538
Approx. Normal/Uniform	1.0101	1.0408	0.9958	1.0529	1.0102	0.9655	1.0377	0.969
Gamma/Gamma	1.1095	1.2825	0.904	1.337	1.1087	0.8556	1.1267	0.9509
Gamma/Uniform	1.016	1.0603	0.9816	1.0659	1.0159	0.9845	1.0256	0.9818
Uniform/Uniform	1.0074	1.0369	0.9916	1.0424	1.0074	0.9957	1.0157	0.9946

Equations42

E [Y ∣ A = a, X = x; β, ψ] = f (x_{β}; β) + γ (x_{ψ}, a; ψ)

E [Y ∣ A = a, X = x; β, ψ] = f (x_{β}; β) + γ (x_{ψ}, a; ψ)

π (x) v (1, x) = (1 - π (x)) v (0, x),

π (x) v (1, x) = (1 - π (x)) v (0, x),

E [Y ∣ H = h; β, ψ] = j = 1 \sum K {f_{j} (h_{j}^{β}; β_{j}) + γ_{j} (h_{j}^{ψ}, a_{j}; ψ_{j})}

E [Y ∣ H = h; β, ψ] = j = 1 \sum K {f_{j} (h_{j}^{β}; β_{j}) + γ_{j} (h_{j}^{ψ}, a_{j}; ψ_{j})}

E [Y ∣ H = h] = E [Y^{opt} ∣ H = h] - j = 1 \sum K μ_{j} (h_{j}, a_{j}; ψ_{j}),

E [Y ∣ H = h] = E [Y^{opt} ∣ H = h] - j = 1 \sum K μ_{j} (h_{j}, a_{j}; ψ_{j}),

X = μ_{X} + [Σ_{X X^{*}} Σ_{X Z}] [Σ_{X^{*} X^{*}} Σ_{Z X^{*}} Σ_{X^{*} Z} Σ_{Z Z}]^{- 1} [X^{*} - μ_{X} Z - μ_{Z}] .

X = μ_{X} + [Σ_{X X^{*}} Σ_{X Z}] [Σ_{X^{*} X^{*}} Σ_{Z X^{*}} Σ_{X^{*} Z} Σ_{Z Z}]^{- 1} [X^{*} - μ_{X} Z - μ_{Z}] .

P (A = 1∣ Z, X^{*}) \approx H [\frac{α _{0} + α _{X}^{'} X + α _{Z}^{'} Z}{( 1 + α _{X}^{'} Σ _{X ∣ Z, X^{*}} α _{X} /1. 7 ^{2} ) ^{1/2}}],

P (A = 1∣ Z, X^{*}) \approx H [\frac{α _{0} + α _{X}^{'} X + α _{Z}^{'} Z}{( 1 + α _{X}^{'} Σ _{X ∣ Z, X^{*}} α _{X} /1. 7 ^{2} ) ^{1/2}}],

\frac{\sum _{i = 1}^{n} v ( 1 , x _{r i}^{*} ) A _{i} x _{r i}^{*}}{\sum _{i = 1}^{n} v ( 1 , x _{r i}^{*} ) A _{i}} = \frac{\sum _{i = 1}^{n} v ( 0 , x _{r i}^{*} ) ( 1 - A _{i} ) x _{r i}^{*}}{\sum _{i = 1}^{n} v ( 0 , x _{r i}^{*} ) ( 1 - A _{i} )} .

\frac{\sum _{i = 1}^{n} v ( 1 , x _{r i}^{*} ) A _{i} x _{r i}^{*}}{\sum _{i = 1}^{n} v ( 1 , x _{r i}^{*} ) A _{i}} = \frac{\sum _{i = 1}^{n} v ( 0 , x _{r i}^{*} ) ( 1 - A _{i} ) x _{r i}^{*}}{\sum _{i = 1}^{n} v ( 0 , x _{r i}^{*} ) ( 1 - A _{i} )} .

\overset{μ}{^}_{2} - μ_{2} = (\hat{A}_{2}^{opt} - A_{2}^{opt}) (ψ_{20} + ψ_{21} X_{2}) + (A_{2}^{opt} - A_{2}) ψ_{21}^{'} (X_{2} - X_{2}) .

\overset{μ}{^}_{2} - μ_{2} = (\hat{A}_{2}^{opt} - A_{2}^{opt}) (ψ_{20} + ψ_{21} X_{2}) + (A_{2}^{opt} - A_{2}) ψ_{21}^{'} (X_{2} - X_{2}) .

m = n^{\frac{1 + ζ ( 1 - p )}{1 + ζ}},

m = n^{\frac{1 + ζ ( 1 - p )}{1 + ζ}},

\overline{X_{\cdot j}^{*}}

\overline{X_{\cdot j}^{*}}

\overline{X_{i}^{* (j)}}

Σ_{X_{j}^{*}}

M

Σ_{X X}^{(1)}

M_{j}

μ_{X} = μ_{X^{*}}

μ_{Z}

Σ_{Z Z}

Σ_{X Z} = Σ_{X^{*} Z}

Σ_{X^{*}}

Σ_{X X}^{(2)}

\frac{\partial ℓ}{\partial α _{0}} = i = 1 \sum n \frac{A _{i} - ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

\frac{\partial ℓ}{\partial α _{0}} = i = 1 \sum n \frac{A _{i} - ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

\frac{\partial ℓ}{\partial α _{1}} = i = 1 \sum n t_{i} \frac{A _{i} - ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

\frac{\partial ℓ}{\partial α _{1}} = i = 1 \sum n t_{i} \frac{A _{i} - ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

i = 1 \sum n \frac{A _{i}}{1 + exp ( α _{0} + α _{1} t _{i} )} = i = 1 \sum n \frac{( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

i = 1 \sum n \frac{A _{i}}{1 + exp ( α _{0} + α _{1} t _{i} )} = i = 1 \sum n \frac{( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )}

i = 1 \sum n \frac{t _{i} A _{i}}{1 + exp ( α _{0} + α _{1} t _{i} )} = i = 1 \sum n \frac{t _{i} ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )} .

i = 1 \sum n \frac{t _{i} A _{i}}{1 + exp ( α _{0} + α _{1} t _{i} )} = i = 1 \sum n \frac{t _{i} ( 1 - A _{i} ) exp ( α _{0} + α _{1} t _{i} )}{1 + exp ( α _{0} + α _{1} t _{i} )} .

i = 1 \sum n A_{i} (1 - P (A = 1∣ T = t_{i}))

i = 1 \sum n A_{i} (1 - P (A = 1∣ T = t_{i}))

i = 1 \sum n t_{i} A_{i} (1 - P (A = 1∣ T = t_{i}))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DylanSpicker/measurement-error-DTRs
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Measurement error and precision medicine: error-prone tailoring covariates in dynamic treatment regimes.

Dylan Spicker

Statistics and Actuarial Science

University of Waterloo

Waterloo, Ontario, N2L 3G1

[email protected]

&Michael Wallace

Statistics and Actuarial Science

University of Waterloo

Waterloo, Ontario, N2L 3G1

Abstract

Precision medicine incorporates patient-level covariates to tailor treatment decisions, seeking to improve outcomes. In longitudinal studies with time-varying covariates and sequential treatment decisions, precision medicine can be formalized with dynamic treatment regimes (DTRs): sequences of covariate-dependent treatment rules. To date, the precision medicine literature has not addressed a ubiquitous concern in health research - measurement error - where observed data deviate from the truth. We discuss the consequences of ignoring measurement error in the context of DTRs, focusing on challenges unique to precision medicine. We show - through simulation and theoretical results - that relatively simple measurement error correction techniques can lead to substantial improvements over uncorrected analyses, and apply these findings to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study.

1 Introduction

Precision medicine is a framework in which medical treatment decisions are based on patient-level data. At its core, precision medicine aims to ‘treat patients, not diseases’, reflecting the principle that the best treatment decision is informed by all relevant, available data on the patient, not solely their diagnosis. This can manifest in the simple case of a single treatment decision (the one-stage setting), but can be readily generalized to longitudinal treatment regimes where all available patient-level data (both present and past) can inform treatment decisions. One way of codifying such a process in the precision medicine framework is through the use of Dynamic Treatment Regimes (DTRs): sequences of decision rules tailored to patient-level covariates. Precision medicine in general, and DTRs in particular, have received a great deal of research attention in recent years.[1, 2, 3, 4]

A key focus of the DTR framework is estimating the optimal sequence of treatment decisions that maximize an expected outcome, conditional on available patient-level data at each decision point. This may be a simple rule based on a single covariate (such as “prescribe treatment if the patient consumes fewer than $1300$ calories a day”), or may be a highly complex set of treatment decisions which depend on many factors. Finding optimal treatment rules can be especially challenging in the observational data setting, wherein observed treatments may themselves be informed by patient-level information. Estimation of such rules has received considerable attention in the biostatistical literature, with the development of numerous estimation procedures.[4, 5]

Measurement error refers to any process through which observable data do not equal the true underlying values of interest.[6] Common examples include blood pressure (typically elevated in clinical settings) [7] or self-reported caloric intake.[8] While measurement error may arise through a variety of mathematical mechanisms, the underlying concern is that analyses which do not account for such error may produce unpredictable and unreliable conclusions. These so-called naive analyses, along with many relevant correction methods, have been widely studied in both linear [9] and non-linear [10] models.

Despite the abundance of literature surrounding both DTRs and measurement error, there has yet to be a substantial attempt to assess the estimation of the former in the presence of the latter. Precision medicine encompasses both estimation and inference (in studying treatment effects) but also prediction (in applying decision rules to future patients). While estimation and inference have received considerable attention in the measurement error literature (outside of the context of precision medicine), prediction has not. Some argue for the use of error-prone variables to predict an outcome of interest directly. This argument advocates exploiting the dependence between our error-prone variable and the outcome, ignoring the underlying causal relationship between the true variable and the outcome. Then, if there is a sufficiently strong relationship between the outcome and observed measure, prediction may remain valid. This argument is correct for some models,[6] however, it is not always clear when it is valid. It has been shown that even in a standard regression setting, it may be necessary to correct for error to ensure valid prediction.[11]

Whether the goal is to assess the efficacy of a treatment regime, or to aid in future treatment decisions, the literature in standard modeling contexts gives us reason to believe that many of the same concerns will exist in precision medicine, and DTRs specifically. We will investigate these issues, establish when and how precision medicine analyses may be affected, and investigate - through simulation and theoretical results - straightforward measurement error correction techniques within the DTR framework. We further apply our methods to analyze the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study.[12, 13]

2 Methodology

A wide variety of methods are available both for DTR estimation and measurement error correction. Common techniques for the former include Q-learning [14, 15] and inverse probability weighting,[16] along with the more complex and robust G-estimation [17, 18] and augmented inverse probability weighting techniques.[19] Dynamic weighted ordinary least squares (dWOLS) offers a compromise between these broad classes of approach, offering robustness to model misspecification while maintaining straightforward implementation.[20] Classification-type approaches, such as outcome weighted learning, have also recently grown in popularity.[21, 22]

The measurement error literature boasts a similar range of options with a familiar trade-off between simplicity of theory and theoretical properties. For example, regression calibration [23, 24] and simulation extrapolation (SIMEX) [25, 26] are both general methods, which make few assumptions regarding the underlying data. This results in comparatively simple estimators which are consistent in linear models, but which can make no general consistency guarantees in non-linear models.[10] In contrast, methods such as the conditional score[27, 28], under correct distributional assumptions, and the corrected score[29, 30] offer consistent estimators for a larger class of models, at the cost of more complex implementation.

As the first substantive work on the interface between DTRs and measurement error, we will limit our focus to methodologies that afford straightforward implementation. DTR estimation will be carried out via dWOLS,[20] whose regression-based implementation is complemented by the measurement error correction method of regression calibration.[23, 24]

To establish notation, and the modeling framework upon which our methodology relies, in this section we will introduce the specifics of a one-stage, error-free DTR, and discuss the estimation procedure using dWOLS. We will then extend to the multistage case. Finally, we will discuss regression calibration generally, clarifying the specific corrections that we will use.

2.1 DTRs and dWOLS

In a one-stage DTR we make a single binary treatment decision ( $A$ ) per patient. We take $A=a\in\{0,1\}$ to denote some binary treatment option, such as standard treatment ( $a=0$ ) compared to an intervention ( $a=1$ ). We are concerned with an outcome variable $Y$ , chosen such that larger values are preferred. Patient information available immediately prior to the decision being made is denoted $\mathbf{X}$ . The optimal DTR will then take $\mathbf{X}=\mathbf{x}$ as input for a patient and return $A=a^{\text{opt}}$ , such that $Y$ is maximized in expectation. Formally, $a^{\text{opt}}=\operatorname*{arg\,max}_{a}E[Y|A=a,\mathbf{X}=\mathbf{x};\beta,\psi]$ , where the mean is modeled as

[TABLE]

with $\mathbf{x}_{\beta}$ and $\mathbf{x}_{\psi}$ representing two (possibly identical) subsets of the covariates $\mathbf{x}$ , and $\beta$ and $\psi$ are model parameters. We often take $\mathbf{x}=(1,\mathbf{x})$ to allow for a baseline effect (or intercept) to be captured in $f$ .

This mean decomposition includes a component $f(\mathbf{x}_{\beta};\beta)$ which does not depend on the treatment, and a component $\gamma(\mathbf{x}_{\psi},a;\psi)$ which captures the effect of treatment. These are the so-called treatment-free and blip components, respectively. The combination of the treatment-free and blip models (as outlined in Equation (1)) is referred to as the outcome model. The treatment decision impacts the outcome only through the blip. As such, estimation of the optimal DTR is equivalent to finding the decision rule which maximizes $\gamma(\mathbf{x}_{\psi},a;\psi)$ . We therefore only need to estimate the $\psi$ terms correctly to determine the optimal DTR.

Often, we take $\gamma(\mathbf{x}_{\psi},a;\psi)=a\psi^{\prime}\mathbf{x}_{\psi}$ to be a linear function of the covariates multiplied by the treatment indicator, meaning $a^{\text{opt}}=1$ if $\psi^{\prime}\mathbf{x}_{\psi}>0$ and $a^{\text{opt}}=0$ otherwise. If we correctly specify the full outcome model then standard regression procedures may be applied. However, as our treatment decision does not depend on the treatment-free component directly, we may wish to seek methodology that does not depend on its correct specification in full. For example, even with the treatment-free model misspecified, we could nevertheless proceed with correct specification of the blip if $a$ and $\mathbf{x}$ were independent, but this is seldom a reasonable assumption in our observational setting (where treatment decisions may be made based on patient-level data).

dWOLS, along with some other methods such as the aforementioned G-estimation [18] and augmented inverse probability of treatment weighting,[19] account for this by requiring the specification of a treatment model, modeling the probability of receiving the intervention given the individual’s covariates. In dWOLS, this allows the calculation of patient-level weights, which we denote $v(a,\mathbf{x})$ for a patient with covariates $\mathbf{x}$ receiving treatment $a$ . These weights are designed to ‘balance’ the covariates. Any weights which satisfy

[TABLE]

where $\pi(\mathbf{x})=P(A=1|\mathbf{X}=\mathbf{x})$ , will suffice for balance. The use of $v(a,\mathbf{x})=|a-\pi(\mathbf{x})|$ is recommended.[20]

In the one-stage setting, dWOLS is simply a weighted ordinary least squares regression with weights satisfying Equation (2) and an outcome model structure as indicated by Equation (1). The resulting estimators for $\psi$ are then doubly robust: as long as the blip model is correctly specified, and at least one of the treatment or treatment-free models is correctly specified, the estimators for $\psi$ will be consistent.

These methods easily extend to multistage processes. A $K$ -stage DTR will have $K$ total treatment decisions made, which we index by $j$ . We wish to estimate the optimal decision for stage $j$ , given all information available immediately prior to the decision. We now subscript the covariate vector and the treatment decision, $\mathbf{x}_{j}$ and $a_{j}$ , to denote the measurements taken and the observed decision at stage $j$ , respectively. We use over- and under-line notation to refer to the past and future respectively, so that (for example) $\overline{\mathbf{x}}_{j}=(\mathbf{x}_{1},\ldots,\mathbf{x}_{j})$ and $\underline{\mathbf{a}}_{j+1}=(a_{j+1},\ldots,a_{K})$ . Finally, for notational convenience, we define a variable to represent the patient’s history prior to the stage $j$ treatment decision: $\mathbf{h}_{j}=(\overline{\mathbf{x}}_{j},\overline{\mathbf{a}}_{j-1})$ . We now expand Equation (1), given the above notation, to

[TABLE]

where $\mathbf{h}_{j}^{\beta}$ and $\mathbf{h}_{j}^{\psi}$ , are (possibly identical) subsets of the history vector $\mathbf{h}_{j}$ . We take $f_{j}$ to be the treatment-free model for stage $j$ , specifying the impact of $\mathbf{h}_{j}^{\beta}$ on the outcome. This impact is not mediated by treatment. Conversely, $\gamma_{j}$ is the blip model for stage $j$ , which indicates the impact of $\mathbf{h}_{j}^{\psi}$ on the outcome. This effect is mediated by treatment ( $a_{j}$ ).

The stage $j$ blip function in this multistage setting is defined as the marginal impact of a patient receiving treatment $a_{j}$ , compared to a patient who received standard treatment at stage $j$ ( $a_{j}=0)$ , with an identical history and who goes on to receive optimal, though not necessarily identical, treatment in the future. That is, $\gamma_{j}(\mathbf{h}_{j},a_{j};\psi_{j})=E[Y^{\overline{\mathbf{a}}_{j},\underline{\mathbf{a}}_{j+1}^{\text{opt}}}-Y^{\overline{\mathbf{a}}_{j-1},0,\underline{\mathbf{a}}_{j+1}^{\text{opt}}}|\mathbf{H}_{j}=\mathbf{h}_{j}]$ , where $Y^{a^{\dagger}}$ refers to the counterfactual outcome $Y$ , which is potentially unobserved, under a treatment regime specified by $a^{\dagger}$ .

There is an alternative formulation for the outcome model, which provides a more intuitive characterization for interpreting dWOLS. The mean outcome can be defined such that

[TABLE]

where $Y^{\text{opt}}$ gives the theoretically optimal outcome (under the optimal DTR) and the $\mu_{j}$ constitute penalty terms for non-optimal treatment. Here $\mu_{j}$ are referred to as regrets, and are the contrast in outcomes between a patient who receives optimal treatment at stage $j$ , and the same patient receiving treatment $a_{j}$ at stage $j$ , assuming the patient is treated optimally thereafter. Formed this way, the observed outcome is equal to the optimal outcome less all negative effects deriving from suboptimal treatment. The regrets may be expressed in terms of the blips, namely $\mu_{j}(\mathbf{h}_{j},a_{j})=\gamma_{j}(\mathbf{h}_{j},a_{j}^{\text{opt}})-\gamma_{j}(\mathbf{h}_{j},a_{j})$ .

There is a recursive nature to the multistage DTR analysis as the treatment decision at stage $j$ impacts all future decisions. In dWOLS we begin by analyzing the final stage of treatment, then work backwards, at each stage generating a pseudo-outcome which removes the effects of future treatment from the outcome. Letting $\tilde{y}_{K}=Y$ , we define the $j$ -th pseudo-outcome as $\tilde{y}_{j}=\tilde{y}_{j+1}+\left(\gamma_{j}(\mathbf{h}_{j},a_{j}^{\text{opt}})-\gamma_{j}(\mathbf{h}_{j},a_{j})\right)=\tilde{y}_{j+1}+\mu_{j}(\mathbf{h}_{j})$ . That is, we ‘add back’ the stage $j$ regret, effectively removing it from the outcome. This allows the pseudo-outcome at stage $j$ to be interpreted as the outcome for a patient who receives their particular regime up to stage $j$ , and then is optimally treated afterwards. We could continue to use the blip formulation of the outcome model. In this case we once again take $\tilde{y}_{K}^{\prime}=Y$ , and then define $\tilde{y}_{j}^{\prime}=\tilde{y}_{j+1}^{\prime}-\gamma_{j}(\mathbf{h}_{j},a_{j})$ . In practice, the regret setup is more commonly implemented, and we will continue to use it (unless otherwise stated).

Estimation using dWOLS in the multistage setting then follows a three step procedure. First, define weights for each stage $v_{j}$ , which satisfy Equation (2), using $\overline{\mathbf{x}}_{j}$ as the covariate. Second, starting at stage $K$ , and working iteratively backwards, solve the weighted regression of $\tilde{y}_{j}$ on the patient history $\mathbf{h}_{j}$ . Third, define $\tilde{y}_{j-1}$ , and repeat. If the blip and at least one of the treatment or treatment-free models are correctly specified at each stage $j$ , this process will lead to consistent estimators for all $\psi_{j}$ .

In order to estimate a DTR we further require two assumptions on our data, whether they are randomized or observational. First, we make the stable unit treatment value assumption (SUTVA).[31] SUTVA requires that a patient’s outcome is not influenced by another patient’s treatment assignment. This is typically reasonable, though may be violated, for instance, when the intervention is a vaccine and the effects of herd immunity influence all observed outcomes. Second, we make the no unmeasured confounders, or sequential ignorability, assumption.[32] No unmeasured confounders requires that all common causes of treatment (at each stage $j$ ) and future potential covariates or outcomes must be measured in the history. That is, conditional on the available history, treatment must be independent of future potential covariates and outcome. While this assumption will typically hold in randomized studies, it is untestable in the observational framework, and so must be carefully validated based on the applicable subject matter. We will make these assumptions for the remainder of our discussion.

2.2 Measurement Error and Regression Calibration

In order to correct for measurement error we need to make assumptions regarding the structure of the error. If we take $X$ to be the true covariate and $X^{*}$ to be an error-prone observation of $X$ , then we assume some form for $X^{*}=g(X,U)$ . Here $U$ is the random noise which induces the error. While any specific application may suggest a particular form for the error model ( $g$ ), two commonly used models are the classical additive, and the multiplicative error models. In the classical additive model we take $g(X,U)=X+U$ , and assume that $E[U]=0$ , and that $U$ has constant covariance, not depending on $X$ , given by $\operatorname{cov}(U)=\Sigma_{U}$ . Moreover, we assume that $U$ is independent of (or sometimes uncorrelated with) both $X$ , and any other covariates we observe (without error) $Z$ . In the multiplicative model we take $g(X,U)=XU$ , and assume that $E[U]=1$ , and again, $\operatorname{cov}(U)=\Sigma_{U}$ . We make the same independence assumptions. Note, that while the notation above tends to imply that $X$ (and as such $U$ ) is a scalar, the same models can be extended to vector-valued covariates; we make no distinction, and have selected notation for simplicity of exposition. Both the additive and multiplicative models provide unbiased measurements of $X$ , in the sense that $E[X^{*}|X,Z]=X$ . When we have an outcome of interest, $Y$ , we also tend to classify error as either differential or non-differential. Non-differential error refers to the case where, given $\{X,Z\}$ , our outcome $Y$ is conditionally independent of $X^{*}$ . Errors may be differential if, for instance, measurements are taken subsequent to the outcome being observed (such as a cancer diagnosis affecting how a patient responds to questions about their historical smoking habits). This is seldom the case in our framework.

Error correction techniques (such as our choice of regression calibration) typically require additional data beyond what is used in standard inferential procedures to learn about the size (and structure) of the error. These may come in the form of validation, replicate, or instrumental data. Validation data consist of a subsample of the observations where both the true and error-prone observations are available. Replicate data consist of repeated measurements of the error-prone covariates for some subset of the individuals. Instrumental data, also known as instrumental variables (IV), refer to additional covariates that are related to the true values, but which are (typically) uncorrelated with both the error observed in the covariate, and the variability in the model after accounting for the true covariates.[10] An IV, $T$ is called unbiased for $X$ , if $E[T|X,Z]=X$ . Replicate measurements can be viewed as a specific type of unbiased instrumental variables. While validation data are typically considered ideal, they are often unavailable in practice. Instead, we focus on the use of unbiased IVs, referred to as error-prone proxies, including replicate measurements.

The premise of regression calibration is to replace the unobserved $X$ in our models with an estimated $\widehat{X}=E[X|Z,X^{*}]$ . We then proceed with standard analysis on the predicted values, adjusting the standard errors as needed. Consider a single patient, with $k$ unbiased proxies of the true covariate, denoted $X^{*}_{1},\ldots,X^{*}_{k}$ . A common procedure for determining $\widehat{X}=E[X|Z,X^{*}]$ is taking the so-called best linear unbiased prediction (BLUP) approach, which involves approximating the conditional mean as a linear equation.[23, 24] We use a plug-in estimator for the theoretical BLUP, given by

[TABLE]

Regression calibration was originally proposed for a fairly general class of additive error models, with independent and identically distributed replicate measurements. [23] When we have identically distributed replicate measures, it is sensible to take the mean of $X^{*}_{1},\ldots,X^{*}_{k}$ for use as $X^{*}$ in this equation. When the error models differ between proxies, it seems unlikely that the most effective way to combine the proxies is a simple mean. Intuitively, the observations which are less disturbed by error ought to contribute more to $X^{*}$ . Our estimators are a generalization of those typically used when replicates are available, [10] where we take $X^{*}=\sum_{j=1}^{k}\delta_{j}X^{*}_{j}$ , for a set of weights $\delta_{j}$ , choosen either for the interpretation of the estimator or to reduce variability. The specific details of the implementation of our estimators are contained in Appendix A.

Applying this correction, when errors are additive, results in consistent estimators in linear models. In non-linear models, some authors have described the estimators as “nearly consistent”.[33] By this they mean that, while we cannot guarantee consistency of all parameters in the models, we can often make claims about reduced bias or consistency of some parameters. For instance, in log-linear models, regression calibration estimators will consistently estimate the slope parameters while inconsistently estimating the intercept.[10] Of greatest concern for our work, outside of linear models, is the utility of regression calibration for logistic regression models. Here, the phrasing “nearly consistent” is taken to mean that the correction, in many situations, provides a great reduction in the bias of the parameter estimates. Further, when the main concern with the fitted logistic regression is the probability estimates, regression calibration provides a reasonable approximation to the truth, when neither the slope parameters, nor the conditional variance of $X$ given $\{X^{*},Z\}$ are too large. Specifically,

[TABLE]

where $H(x)=(1+\exp(-x))^{-1}$ is the inverse-logit (expit) function, $\alpha$ are the parameters of the logistic regression, and $\Sigma_{X|Z,X^{*}}$ gives the conditional covariance of $X$ given $\{Z,X^{*}\}$ .[10] In general, the denominator will attenuate the estimates for the model parameters. When the denominator of this approximation is close to $1$ , however, this attenuation will be small, and the estimator using regression calibration takes an approximately correct form.

While the corrections we consider are motivated by non-differential classical additive error models, a wider class of error models can be accommodated by the methods we introduce. Any non-differential error model, with (1) more than one unbiased proxy available for $X$ , (2) uncorrelated errors between proxies (that is, the covariance between any two unbiased measurements equals the covariance of $X$ , $\Sigma_{X}$ ), and (3) the covariance of all of the unbiased proxies given as $\Sigma_{X}+M$ for suitable constant $M$ , can be interchanged if necessary. This would include the multiplicative error model introduced above. However, when errors are multiplicative, it is generally advisable that transformations are applied to the measured values, to use a scale on which the errors are additive. [Multiplicative_Transformations] While the existing asymptotic theory, and most implementations of regression calibration correction, rely on the additive structure, the modified estimators we present are computable under this wider class of models. In our simulation studies (Section 4) we demonstrate that the corrections perform adequately under slight deviations from additivity. Still, if there is good reason to believe that all proxies are subject to non-additive error, which is transformable to an additive model, we would advise that analysts make these transformations, as asymptotic justifications for the presented methods on those models are lacking.

3 Measurement Error in Dynamic Treatment Regimes

With little work to date concerning measurement error in DTRs and precision medicine, we shall limit attention to the case of errors in patient-level covariates, assuming treatments and outcomes are measured correctly. In this section we will illustrate that, in addition to concerns arising from measurement error that are common to traditional modeling settings, there are considerations which are unique to the structure of DTRs. We discuss these considerations alongside various theoretical observations.

First, we motivate the construction of valid estimators in the one-stage setting, providing a theoretical guarantee of sample covariate balance in the presence of measurement error. Next, we illustrate how these estimators can be extended to the multistage setting, paying specific attention to the estimation of pseudo-outcomes. We then discuss how confidence intervals may be constructed using a modified bootstrap procedure. Finally, we focus on the process of determining optimal treatments in the future, reframing the estimation as a prediction problem, and discuss the merits of error correction in this context.

3.1 Blip Parameter Estimation

To estimate a DTR with dWOLS, we must specify the outcome and treatment models. We assume that a biological process (or similar) relates the true covariate values $X$ to the outcome $Y$ , whereas treatment decisions can only be made based on the error-prone observed values $X^{*}$ . This structure is shown graphically in Figure 1. The treatment decision may be based on a single observed proxy, for instance $X^{*}_{1}$ , or on some combination of these proxies, for instance $X^{*}=\frac{1}{k}\sum_{i=1}^{k}X^{*}_{i}$ .

Recall that dWOLS produces doubly robust estimators through the principle of covariate balance. If our treatment-free model is misspecified, then correct specification of the treatment model will induce covariate balance in $X$ , (that is $E[X|A=1]=E[X|A=0]$ in the weighted data set), leading to consistent estimation of the blip parameters.[20] In the error-prone setting, if we employ regression calibration using $\widehat{X}$ in our outcome models, we wish to induce covariate balance not between $X$ and $A$ , but between $\widehat{X}$ and $A$ . Following the proof by Wallace and Moodie which justifies the choice of weights,[20] we might intuitively speculate that any weights which satisfy $\pi(\widehat{X})v(1,\widehat{X})=(1-\pi(\widehat{X}))v(0,\widehat{X})$ will induce covariate balance in $\widehat{X}$ . In the error-free setting weights of the form $v(a,x)=|a-E[A|X=x]|$ are recommended, suggesting the use of weights of the form $v(a,x^{*})=|a-E[A|\widehat{X}=x^{*}]|$ may prove suitable. We observe the following Result.

Result 1.

Let $\mathbf{X^{*}}=(X^{*}_{1},\ldots,X^{*}_{k})$ be observed as a set of unbiased proxies for $X$ , and denote the regression calibration estimates of $X$ , based on $\mathbf{X^{*}}$ , by $\widehat{X}$ . Take $P(A=1|L)=\text{H}(\alpha^{\prime}L)$ , where $L$ is any tailoring covariate (vector), which may (but does not necessarily) contain $X$ . Using the weights $v(A,\widehat{X})=|A-\hat{P}(A=1|\widehat{X})|$ , where $\hat{P}(A=1|\widehat{X})$ is estimated through a simple logistic regression, the weighted sample means, conditional on $A$ , will be equivalent. Given the above setup,

[TABLE]

Proof of Result 1: See appendix.

Result 1 states that the estimation procedure alone ensures that, within the sample, the weighted means are equivalent between observations with $A=1$ and with $A=0$ , regardless of the true underlying treatment model. The idea of using sample balance as a small-sample proxy for true balance has been explored in a traditional balancing score setting.[34] We will show through simulations that, in many situations, this sample balance suffices to maintain the double robustness of dWOLS.

When introducing regression calibration we discussed that, in general, the parameters are not consistently estimated in a logistic regression. This discussion is less directly relevant to our present scenario. The reason is that, under our assumed model, the “true” covariates used to inform treatment are the observed covariates, which by assumption are $X^{*}$ . While $X^{*}$ is error-prone with respect to the underlying value of interest, $X$ , and as such the outcome model, it is not error-prone with respect to the treatment model, where decisions are informed using $X^{*}$ . As a result, when applying regression calibration to the treatment model, we are using $\widehat{X}$ in place of $X^{*}$ , not in place of $X$ . This remains an approximation – one which is shown in simulations to adequately induce balance – but not the standard approximation discussed in the literature.

3.2 Multistage DTRs with Error

Having established an approach for blip parameter estimation in the single-stage problem, we now consider the multistage case, which further requires the estimation of the pseudo-outcomes, $\tilde{y}_{i}$ . If using the regret formulation, then to estimate the pseudo-outcome we must estimate $\hat{a}_{j}^{\text{opt}}$ , as well as the blip function itself. Consider, for notational ease, a two-stage DTR, which has a linear specification for $\gamma_{2}(x_{2},a_{2})=a_{2}(\psi_{20}+\psi_{21}^{\prime}x_{2})$ . Then $a_{2}^{\text{opt}}=I(\psi_{20}+\psi_{21}^{\prime}x_{2}>0)$ . In the error-free case, we have $\hat{a}_{j}^{\text{opt}}=I(\widehat{\psi}_{20}+\widehat{\psi}_{21}^{\prime}x_{2}>0)$ rendering $\widehat{\tilde{y}}_{1}=Y-\mu_{1}-\mu_{2}+(\hat{a}_{2}^{\text{opt}}-a_{2})(\widehat{\psi}_{20}+\widehat{\psi}_{21}^{\prime}x_{2})$ . Under the condition that $\widehat{\psi}_{20}=\psi_{20}$ and $\widehat{\psi}_{21}=\psi_{21}$ , then $\hat{a}_{2}^{\text{opt}}=a_{2}^{\text{opt}}$ and $\widehat{\mu}_{2}=(\hat{a}_{2}^{\text{opt}}-a_{2})(\widehat{\psi}_{20}+\widehat{\psi}_{21}^{\prime}x_{2})$ , simplifying the estimated pseudo-outcome to $\widehat{\tilde{y}}_{1}=Y-\mu_{1}$ , which is the same as the theoretical quantity $\tilde{y}_{1}$ .

This simplification will not (necessarily) occur in the error-prone case for two reasons. First, even if the $\widehat{\psi}$ are correctly estimated, the use of $\widehat{X}_{2}$ in place of $X_{2}$ will result in a residual term between the blip functions. Second, the estimated optimal treatment $\hat{a}_{2}^{\text{opt}}$ may differ from the true optimal treatment. Assuming that we have $\widehat{\psi}_{20}=\psi_{20}$ and $\widehat{\psi}_{21}=\psi_{21}$ , then we have

[TABLE]

If instead of the regret formulation, we compute the pseudo-outcome as described in the blip formulation (and as such do not need to estimate the optimal treatment), we will be left with the residual term $\gamma_{2}-\widehat{\gamma}_{2}=A_{2}\psi_{21}^{\prime}\left(X_{2}-\widehat{X}_{2}\right)$ . While it is not possible, without access to $X_{2}$ directly, to guarantee that these two sources of error are completely eliminated, we are afforded some flexibility in how they are computed. In particular, if we estimate the blip parameters using the regression calibration correction, and assume that they have been correctly estimated, we could separately choose a covariate, $X^{*}_{2}$ , to use for estimating $\tilde{y}$ .

In the blip characterization $X^{*}_{2}$ should be chosen in such a way as to minimize $X_{2}-X^{*}_{2}$ . Noting that $\widehat{X}_{2}$ is chosen to be the (linear) estimator of $X_{2}$ which minimizes the mean squared error (MSE), this gives reasonable justification for selecting $\widehat{X}_{2}$ . It is also worth noting that if $A_{2}=0$ the blip pseudo-outcome is exactly correct. As such, practitioners applying this method with this characterization may wish to consider a regression calibration conditional on $A_{2}=1$ (that is, estimate the BLUP only for those who received second stage treatment), which would minimize the MSE among linear estimators for only those patients who contribute to the biased pseudo-outcomes.

The regret characterization warrants slightly more involved consideration. The second term in Equation (6) has an impact dictated by $X_{2}-X^{*}_{2}$ , as in the blip formulation. The first term, however, relies on a difference of indicator functions. If $\gamma_{2}>>0$ or $\gamma_{2}<<0$ , such that there is an unambiguous optimal treatment for the individual, then controlling $|\widehat{\gamma}_{2}-\gamma_{2}|$ leads to $\widehat{A}_{2}^{\text{opt}}=A_{2}^{\text{opt}}$ . In this situation, $\widehat{\gamma}_{2}$ near $\gamma_{2}$ simplifies to the condition that $X^{*}_{2}$ is near $X_{2}$ , and so we can once again rely on the justification that $\widehat{X}_{2}$ minimizes the MSE to motivate the selection of the regression calibration correction. If we have that $|\gamma_{2}|\leq\epsilon$ for a sufficiently small $\epsilon$ , such that the optimal treatment is ambiguous, then it no longer suffices to have $\widehat{\gamma}_{2}$ near $\gamma_{2}$ (as even small perturbations between these quantities may lead to $\widehat{A}_{2}^{\text{opt}}\neq A_{2}^{\text{opt}}$ ). However, if we do have $\widehat{\gamma}_{2}$ near $\gamma_{2}$ , then we can also make the claim that $|\widehat{\gamma}_{2}|$ is small, relatively speaking. The magnitude of the first term in Equation (6) is given by $|\widehat{\gamma}_{2}|$ , therefore, selecting an estimator to be near $\gamma_{2}$ will ensure that either (1) $\widehat{A}_{2}^{\text{opt}}$ is likely to be optimal in the event that there is a large treatment effect, or (2) that the magnitude of the error produced will be small when $\widehat{A}_{2}^{\text{opt}}$ is not optimal. This provides a heuristic rationale to use the regression calibration correction in order to estimate the pseudo-outcomes. In order to improve the MSE by conditioning, as was possible in the blip characterization, we would want to limit focus to patients for whom $A_{2}^{\text{opt}}\neq A_{2}$ . However, $A_{2}^{\text{opt}}$ is not observable, and as such this is not a possible strategy.

There are obvious limitations to this justification. The first is that, in certain settings, it may be possible to derive an estimator which minimizes a loss function on the classification of optimal treatments. Further, restricting consideration to linear estimators of $X_{2}$ may be ill-advised. Finally, a metric other than MSE may be preferable to measure the distances in this setting. The first issue is a problem that is linked to optimal treatment recommendation (which we investigate briefly in Section 3.4). The second concern extends beyond the estimation of pseudo-outcomes, and in such situations where linear estimators perform poorly, alternative corrections should be considered. There are extensions to regression calibration which provide higher order corrections which may be suitable.[10] Finally, where MSE is an inappropriate metric, practitioners of the methodology may be able to solve for an estimator which optimizes the desired metric instead. MSE is a generally applicable metric, which ought to serve well in a wide variety of scenarios.

In order to perform dWOLS in an error-prone setting, we ultimately recommend computing the regression calibration estimates for all error-prone covariates, and then using these values in the treatment, treatment-free, and blip models, in addition to the estimation of the pseudo-outcomes. This procedure promises consistent parameter estimates under correct model specification, sample covariate balance using the weights, and a heuristic justification for acceptability of pseudo-outcomes. We caution any practitioner applying these methods to be mindful to their particular scenario, ensuring that the structures we have assumed are reasonable, and that our discussions remain valid for their use case.

3.3 Confidence Intervals and Standard Errors

Regression calibration does not, in general, lend itself to the computation of closed-form variance estimators for the parameters of interest. There do exist derivations for asymptotic standard errors in generalized linear models, however bootstrapped confidence intervals tend to be the preferred solution.[10] In the case of dWOLS, there has been little theoretical development on closed-form variance estimators. They have been derived for the single-stage setting, where the authors caution that “such variance estimates require careful calculation and coding, and so will likely not be practical for the typical analyst.” indicating that bootstrap procedures seem to perform satisfactorily in their exploratory analyses.[20] A modified bootstrap procedure, the m-out-of-n bootstrap, was proposed for use in Q-learning to handle non-regularity concerns in the estimation of DTRs.[35] The proposed adaptive procedure for selecting $m$ in Q-learning has been applied, with some success, to dWOLS.[36] It seems that, where measurement error is a concern, a bootstrap procedure would presently be most suited for estimating confidence intervals for DTR parameters.

We consider the m-out-of-n procedure, with an adaptive choice of $m$ to construct our intervals. We outline the fundamentals of the algorithm here, and advise the interested reader to consider the existing literature for a deeper exploration.[35, 36] The method performs a standard non-parametric bootstrap, where samples of size $m<n$ are drawn (with replacement), in place of the more conventional $n$ . The theory dictates only that $m=o(n)$ , and so in the finite sample case, we require a procedure for estimating $m$ from the data. We take

[TABLE]

where both $p$ and $\zeta$ are hyperparameters, selected from the data. The parameter $p$ is a measure of the non-regularity for the model in question, taking values in $[0,1]$ . Of note, when $p=0$ , (where we have no regularity concerns), $m=n$ and this method is equivalent to the standard bootstrap. For a fixed value of $n$ , $m\in[n^{1/(1+\zeta)},n]$ , and so $\zeta$ can be viewed as a parameter which controls the smallest acceptable re-sample size.

We use an adaptive approach which estimates both $p$ and $\zeta$ from our data. Consider, for notational simplicity, a two-stage setting. Non-regularity concerns stem from patients for whom small perturbations in covariates lead to different optimal treatment decisions. As such, we take $\hat{p}=\widehat{P}(\widehat{\gamma}_{2}=0)$ , which we estimate by considering the proportion of individuals who do not admit a unique optimal treatment decision at the second stage. That is, we construct confidence sets for the second stage blip, and count the proportion of individuals for whom this set contains [math]. To select $\hat{\zeta}$ , we use a double-bootstrap procedure.

We start by setting $\zeta$ to be a small value, and then draw $B_{1}$ samples of size $n$ from the initial data. Within each of these samples, we estimate $\hat{p}^{(b_{1})}$ and the parameters of interest, $\widehat{\psi}^{(b_{1})}$ . We then conduct an m-out-of-n bootstrap procedure with $B_{2}$ iterations, using the current value of $\zeta$ and $\hat{p}^{(b_{1})}$ to compute $\widehat{m}^{(b_{1})}$ . We use these $B_{2}$ resamples to form a confidence interval around the parameters of interest. This is repeated for each of the $B_{1}$ samples. We then check the nominal coverage probability, counting the proportion of the $B_{1}$ intervals which contain the initial estimate, and if this is at the desired level, we select the present value of $\zeta$ for $\hat{\zeta}$ . Otherwise, we increment $\zeta$ and run the procedure again. The search space for $\zeta$ can be selected as necessary for the application, for instance, restricting the maximum considered value based on the smallest allowable re-sample size. Once $\hat{\zeta}$ and $\hat{p}$ are selected the bootstrap is performed with the estimated $\widehat{m}$ .

3.4 Future Treatments

While we have focused on the identification of the optimal DTR, an important extension is to consider the implications that measurement error has on future treatment decisions. One consideration is to frame future treatment decisions as a prediction (or classification) problem. In such a framing, our goal is not to correctly estimate the causal parameters, but rather to correctly classify patients into their optimal treatment categories. As previously discussed, it is sometimes argued that measurement error corrections are unnecessary in a prediction setting, though this is not universally applicable.[11] The complexity of DTRs suggest that it is worth considering the utility of error correction for predictions, and studying the effects of error being ignored.

In order to implement error correction for the assignment of future treatments we must consider what information is available when making those decisions. If we have measurements available for the entire population we wish to treat prior to making any treatment decision, we can apply regression calibration directly, and treat based on the imputed values. This setting is distinct from the situation where we are making treatment decisions one at a time, and consequently cannot pool the patients’ information in order to apply regression calibration directly. In this setting we instead propose a pseudo-correction, where the parameters required to adjust the covariates are made available, from the fitting stage, for use in the prediction stage. It is also conceivable, for instance due to cost constraints, that during the study we have error-prone covariates, while future decisions may be informed by the true covariates. Here, the prediction problem becomes one of predicting across domains.

4 Simulation Studies

We now demonstrate, via simulation, the potential impact of measurement error in the context of DTRs. We emphasize the issues that are present when conducting a naive analysis, and show the feasibility of regression calibration to largely correct for the errors in the analysis, as per our preceding discussions.

4.1 Parameter Estimation

We begin by demonstrating the bias present in blip parameter estimates resulting from a naive analysis, and the robustness of our proposed estimation procedures. First, we consider a simple one-stage setup, with $X\sim N(0,1)$ , and assume that we observe two proxy measurements, given by $X^{*}_{1}\sim X+N(0,0.25)$ , and $X^{*}_{2}\sim X+t_{8}$ . We assume that the treatment model is given by $P(A=1|X^{*}_{1}=w)=H(1-0.5w+1.5\exp(w-1))$ , and the outcome model is specified as $Y=X+\exp(X)+A(1+X)+\epsilon$ where $\epsilon\sim N(0,1)$ , independent of all other variables. We are interested in estimating blip parameters $\psi_{0}=1$ and $\psi_{1}=1$ .

In this setting, we consider four analyses, repeated with and without regression calibration, altering which components of our models are correctly specified. We fit models with (1) neither the treatment nor treatment-free models correctly specified, (2) only the treatment model correctly specified (where the treatment-free is taken to be linear), (3) only the treatment-free model correctly specified (where the treatment model is taken to be linear in the logistic scale), and (4) where both are correctly specified. In all scenarios we simulated $10000$ datasets of size $n=1000$ . The results are summarized in Figure 2.

When at least one model is correctly specified (analyses (2)-(4)), the naive estimators of $\psi_{0}$ perform well. In all four scenarios the naive results are biased for $\psi_{1}$ . Regression calibration results in a clear improvement over the naive estimators in the results for $\psi_{1}$ across analyses (2)-(4), where the bias is largely removed. There is a clear, though dramatically reduced, bias in analysis (2), where the estimates rely on the correct specification of the treatment model alone.

We further consider extending these analyses to a variety of two-stage DTR settings, adapted from the original dWOLS paper.[20] In addition, we consider a scenario in which various error model combinations were used. We compare the estimation of the relevant parameters using the proposed correction to the parameter estimates obtained under an analysis using a weighted average of the available proxies. For all situations we consider basing treatment on only the first naive proxy, or on the mean of the available proxies. These results are summarized in the appendix, in Tables 3-7.

In general, we see that whether actual treatment decisions are based on a single error-prone covariate, or on the mean of multiple proxies, the correction methods are generally applicable. Across the majority of scenarios, the proposed corrections tend to greatly improve estimates compared to the naive analysis, and yield results which appear broadly consistent. The corrections work well across a variety of error mechanisms, where performance is only materially impacted when using a multiplicative gamma distribution to induce the error. These results confirm our comments regarding the importance of additive error models: the methods are somewhat resilient to these assumptions, but analysts should be careful when there is good reason to suspect a multiplicative model. When the treatment model is badly misspecified, we see notable degradation in the quality of the correction. These simulations suggest that careful consideration must be given to fitting the treatment model.

4.2 Coverage Probabilities

Next, we consider three scenarios to test the applicability of the proposed bootstrap procedure. Due to the computational demands of the adaptive procedure, it is not feasible to conduct a full simulation study, adaptively selecting $\zeta$ for each experiment. Instead, we perform the double-bootstrap procedure once under each of the scenarios, and then consider the m-out-of-n bootstrap for values of $\zeta$ surrounding the selected one. In all three scenarios we take $X_{1},X_{2}\sim N(0,1)$ , and observe two error prone proxies. Scenario 1 takes $X^{*}_{11},X^{*}_{12}\sim X_{1}+N(0,1)$ , and $X^{*}_{21},X^{*}_{22}\sim X_{2}+N(0,1)$ . Scenario 2 takes $X^{*}_{11}\sim X_{1}+N(0,1)$ , $X^{*}_{12}\sim X_{1}+\text{Unif}(-1,1)$ , $X^{*}_{21}\sim X_{2}+N(0,1)$ , and $X^{*}_{22}\sim X_{2}\cdot\text{Gamma}(1,1)$ . Finally, scenario 3 takes $X^{*}_{11}\sim X_{1}+\text{Unif}(-1,1)$ , $X^{*}_{12}\sim X_{1}\cdot\text{Gamma}(1,1)$ , $X^{*}_{21}\sim X_{2}+N(0,0.25)$ , and $X^{*}_{22}\sim X_{2}+\text{Unif}(-1,1)$ . For all three scenarios, we take $P(A_{j}=1|X^{*}_{j1}=w)=H(w)$ . The outcome for scenarios 1 and 2 is given by $Y=X_{1}+X_{2}+A_{1}(1+X_{1})+A_{2}(1+X_{2})+\epsilon$ where $\epsilon\sim N(0,1)$ independent of everything else. For scenario 3, we introduce an additional binary covariate, $Z_{2}$ , with $P(Z_{2}=1)=0.5$ . We then take $Y=X_{1}+X_{2}+A_{1}(1+X_{1})+A_{2}(1+X_{2}-Z_{2}-Z_{2}X_{2})+\epsilon$ where, again, $\epsilon\sim N(0,1)$ . Note that, if $Z_{2}=1$ then $\gamma_{2}=0$ , meaning that the optimal treatment is not well-defined.

In the first two scenarios we estimate $\hat{\zeta}=0.05$ , while in the third scenario $\hat{\zeta}=0.075$ . For all scenarios we consider forming bootstrap confidence intervals using (1) a traditional n-out-of-n bootstrap, (2) an m-out-of-n bootstrap where $\zeta=0.05$ is used in the adaptive procedure, and (3) an m-out-of-n bootstrap where $\zeta=0.10$ is used in the adaptive procedure. For the third scenario, we also include an m-out-of-n bootstrap where $\zeta=0.075$ is used. The coverage probabilities are contained in Table 1. We see that the standard bootstrap procedure attained the nominal coverage in all settings. Taking the selected $\widehat{\zeta}$ met the nominal coverage levels in the second scenario, and was slightly conservative for the first and third scenarios, where taking $\zeta=0.10$ , we obtained mostly conservative intervals. In the third scenario, all procedures tended to produce conservative results.

4.3 Future Treatment Predictions

To investigate future treatment assignment, we consider a two-stage DTR where $X_{1}\sim N(0,1)$ and $X_{2}\sim N(A_{1},1)$ are the true covariates, with replicate observations $X^{*}_{11}\sim X_{1}+t_{10}$ , $X^{*}_{12}\sim X_{1}+N(0,1)$ and $X^{*}_{21},X^{*}_{22}\sim X_{2}+N(0,0.25)$ . The outcome is given by $Y=X_{1}-(A_{1}^{\text{opt}}-A_{1})(1-X_{1})-(A_{2}^{\text{opt}}-A_{2})(3-2X_{2})+\epsilon$ , with $\epsilon\sim N(0,2)$ independent of all other parameters. The treatment models take the form $P(A_{i}=1|X^{*}_{i1}=w)=H(1-w)$ .

We partition these analyses into three settings based on the information available at the time treatment decisions (or predictions) are made. Namely, where we have access to (1) error-prone measurements for only one patient at a time, (2) error-prone measurements for all patients at once, and (3) the true covariate values for prediction. For these scenarios we consider the performance of the naive model compared to the corrected model. For the first setting, direct regression calibration is not possible. Instead, we conduct the pseudo-correction described in Section 3.4, to produce a corrected estimate. We provide the results for this correction applied when we measure both proxies and when we only measure a single proxy. In the second setting, we do not fit the naive model (as it is equivalent to scenario (1)). Each analysis above is run with $n=1000$ individuals during the fitting stage, and the treatment assignment is run for $5000$ individuals. We repeat the set of simulations $10000$ times.

The complete results are provided in Figure 3, where, on average, the corrected methods perform better than the naive methods in terms of accuracy. The results suggest that, in the worst case scenario, framed as a problem of prediction, the naive and corrected methods perform comparably. However, there are dramatic gains in terms of optimality of treatment in the event that additional information is available when assigning future treatments. At stage one, the pseudo-correction performs favorably using only the second proxy, as compared to only the first proxy, as a result of its lower variance and higher weight while fitting the estimator. This suggests that if the pseudo-correction is to be used, and only one proxy will be made available, the proxy with the lowest variance is preferred.

5 STAR*D study

We now illustrate the proposed correction methods through application to data from the Sequenced Treatment Alternatives to Relieve Depression (STARD) study. The STARD study was a multistage randomized controlled trial, comparing different treatment regimes for patients with major depressive disorder.[12, 13] The study was split into four phases (with phase two further subdivided into two sub-phases) where, at each phase, different treatment options were available to patients based on preference and progression through the study. The severity of depression was measured through the Quick Inventory of Depressive Symptomatology (QIDS) score, where assessment was conducted during each phase both by the patient (denoted QIDS-S) and by a clinician (denoted QIDS-C). At the end of each study phase, patients who had a clinician assessed QIDS score less than or equal to $5$ were considered to have entered remission, and were subsequently removed from the study. At phase 1, all patients were prescribed citalopram. At the end of phase 1, those who did not enter remission entered the second stage where seven treatment options were available: this phase was characterized by ‘switching’ from citalopram to one of four other treatments options, or ‘augmenting’ treatment by receiving citalopram alongside one of three new treatments. Those who had still not entered remission entered a third (and possibly fourth) phase, where treatment was again switched our augmented with a variety of possible options. Full details of the study, and of the treatment options, are described elsewhere.[13]

The first and fourth phases of the trial are typically ignored in DTR analysis of these data. We focus on phases two (merging both sub-phases) and three, which we refer to as stage one and stage two, respectively. Previous analyses [37] specified QIDS-C as an outcome, and dichotomized treatments differentiating those which contain a selective serotonin reuptake inhibitor (SSRI) and those which do not. These analyses model QIDS-C as a continuous covariate and consider three tailoring variables: QIDS-C measured at the start of each level (given by $Q_{j}$ for stage $j$ ), the change in QIDS-C divided by the elapsed time over the previous level (referred to as QIDS slope, denoted $S_{j}$ for stage $j$ ), and patient preference (denoted $P_{j}$ for stage $j$ ), a binary indicator specifying whether the patient desired to switch treatment regimes ( $P_{j}=1$ ) or augment ( $P_{j}=0$ ). Treatment is coded as $A_{j}=1$ if the stage $j$ treatment includes an SSRI, and a [math] otherwise. Our analysis considers 283 patients, who had all stage one and two covariates measured. The outcome was taken to be $Y=-\frac{1}{2}\left(\text{QIDS-C}_{1}+\text{QIDS-C}_{2}\right)$ , where $\text{QIDS-C}_{j}$ is the clinician rated QIDS score at the end of stage $j$ .

Existing analyses of these data make the implicit assumption that clinician scores are error-free measurements. However, the inclusion of self-assessed measures (QIDS-S), offers a feasible mechanism for exploring measurement error in this study. If we postulate that there exists a true underlying symptom score for every patient, then we might propose that both the self-assessed and the clinician scores are surrogate measures for this truth, permitting regression calibration. We note that our analysis continues to use QIDS-C as the outcome variable, to remain comparable with previous literature.

5.1 Model Fitting and Comparison

We consider fitting the model using only the clinician ratings, only the self-reports, or using the correction where they are considered to be error-prone proxies. Following previous analyses of the data, we pose a first stage treatment model using only first stage preference ( $P_{1}$ ) and a second stage treatment model using only second stage preference ( $P_{2}$ ). For the first stage, the treatment-free and blip models are linear in preference ( $P_{1}$ ), slope ( $S_{1}$ ), and initial QIDS score ( $Q_{1}$ ). At the second stage, the treatment-free model is linear in preference ( $P_{2}$ ), slope ( $S_{2}$ ), starting value ( $Q_{2}$ ), as well as stage one treatment ( $A_{1}$ ). The blip model used only slope ( $S_{2}$ ) and starting value ( $Q_{2}$ ). For each of the settings we conducted an m-out-of-n bootstrap, choosing $m$ using the outlined adaptive procedure. Table 2 contains the results for parameters estimates and $95\%$ confidence intervals.

Previous analyses have found that the only significant treatment effect was the interaction between stage one treatment and preference ( $A_{1}P_{1}$ ), [35] a result that is replicated on our subset of the data when using only clinician scores. If instead we assume that the self-reported scores represent the true values, we find a significant treatment effect at stage two, with the interaction between treatment and slope ( $A_{2}S_{2}$ ). However, if we perform our correction, neither of these effects remains significant, and we lack evidence for any significant treatment effects. This may be due to increased uncertainty from the error, but it nevertheless suggests further consideration is required.

6 Discussion

Dynamic treatment regimes provide a powerful framework for characterizing treatment pathways. The theory surrounding optimal DTR estimation is well-developed, with numerous methods available. Measurement error is a pervasive issue in many scenarios and has consequently received extensive attention. However, there has been no substantive work which investigates DTRs in an error-prone setting. The errors arising due to measurement error in the DTR framework are somewhat unique, as the treatment-free, treatment, and blip models are all affected by measurement error separately. The treatment model no longer depends on the true covariate values, as these are unobserved, adding extra complexity to the modeling procedure. Additionally, fitting a DTR often requires patient weights designed to induce covariate balance between a patient’s treatment and their covariates. This balance may not be guaranteed in the presence of measurement error.

Despite these additional considerations, since DTRs can be effectively estimated through regression methods, and since measurement error has been well-studied in regression frameworks, extant error correction methods offer a feasible solution. We investigated the use of regression calibration to correct for covariate measurement error, in the tailoring covariates, in DTRs with a continuous outcome. We have demonstrated that the application of regression calibration within the dWOLS analysis framework is an effective technique for substantially reducing the bias present in an analysis that does not attempt to account for measurement error. Further, the estimates tended to exhibit desirable behavior across a wide variety of settings, largely preserving the doubly robust property of dWOLS.

The multistage setting poses additional considerations due to the need to estimate the sequence of pseudo-outcomes. Even when the blip parameters are correctly estimated there is residual bias when constructing the pseudo-outcome from any error-prone proxy. We argue that using the regression calibration correction to form the pseudo-outcomes can be justified, using MSE as a metric across the class of linear estimators for the true covariates. While this argument is largely heuristic, the simulations which were conducted tend to confirm that this procedure often suffices to correct the parameter estimates. There is opportunity for further theoretical development of these concepts, such as considering what metrics are best-suited for the assessment of pseduo-outcome estimates, both when covariates are subject to error and when they are not.

The m-out-of-n bootstrap with an adaptive choice for $m$ , as proposed to handle non-regularity in the case of error-free DTRs, seems promising as a mechanism for producing nominal, or at worst conservative, confidence intervals while using the proposed correction. This is in line with the related theory in both the measurement error and DTR literature, where the m-out-of-n bootstrap has shown to be effective for dWOLS, and bootstrap methods are broadly useful with regression calibration. While there is substantial room for the development of rigorous confidence intervals using dWOLS, both in the error-prone and error-free settings, the recommended m-out-of-n procedure seems to be well-suited for the task.

While it may seem plausible to view the process of estimating a DTR to form a treatment rule for future patients as a question of prediction, removing the need for error correction methods, we have demonstrated that the naive estimates are more highly variable and tend to provide lower rates of optimal treatment as compared to a corrected analysis. Further, when there is the possibility of observing error-free patient information when making future treatment decisions, the gain from using regression calibration in the study is substantial, in terms of the proportion of optimally treated individuals.

Finally, we demonstrated our methods through analysis of the STAR*D study data, where additional error-prone proxies were available due to the self-reported QIDS scores, which have typically been ignored in prior analyses of this dataset. Accounting for errors alters the estimated optimal treatment rules, which may be attributable to the increased uncertainty that measurement error induces, though it suggests that the true optimal decision rules likely require further investigation to be correctly identified.

The resolution of measurement error for DTRs is a complex challenge. The approaches proposed in this work rely on a number of assumptions that may not hold in practice, such as the availability of unbiased instrumental data or the applicability of the error models explored. Nevertheless, we have demonstrated both the potential impact of measurement error on so-called naive analyses, and the ease with which considerable gains can be made through the application of comparatively straightforward analytical techniques. More complex methods (both from the DTR and measurement error literatures) may elicit further improvements, and we anticipate pursuing these in future work. Beyond this, errors in the treatment and outcome variables must also be considered, both of which may be investigated as natural extensions to the results presented here.

Acknowledgments

This work was funded by a Natural Sciences and Engineering Research Council (NSERC) Discovery Grant. The data for the STAR*D study are available from the National Institute of Mental Health (NIMH). Restrictions apply to the availability of these data, which were used under license for this study. Data are available through the NIMH Data Archive (NIMH NDA ID: 2148). We are grateful for the feedback from the reviewers and editors on an early draft of this manuscript. The provided comments contributed important developments to the methodologies presented.

Data Availability Statement

All of the R code used to run the simulation studies and sensitivity analyses is freely available online at: https://github.com/DylanSpicker/measurement-error-DTRs. The data for the STAR*D study are available from the National Institute of Mental Health (NIMH). Restrictions apply to the availability of these data, which were used under license for this study. Data are available through the NIMH Data Archive (NIMH NDA ID: 2148).

Appendix A Regression Calibration: Details

We first consider defining the optimal weights $\delta_{j}$ . In our investigation, we have paid most attention to three separate sets of weights: (1) using $\delta_{j}=\frac{1}{k}$ for all $k$ proxy measurements, (2) viewing $X^{*}$ as an estimator for $X$ , and minimizing the variance of that estimator, which gives $\delta_{j}=\operatorname{Tr}\left(M_{j}\right)^{-1}\left[\sum_{l=1}^{k}\operatorname{Tr}\left(M_{l}\right)^{-1}\right]^{-1}$ , or (3) treating $\delta_{j}$ as parameters in the BLUP and solving, which gives the form $\delta_{j}=\operatorname{Tr}\left(\beta^{\prime}\beta M_{j}\right)^{-1}\left[\sum_{l=1}^{k}\operatorname{Tr}\left(\beta^{\prime}\beta M_{l}\right)^{-1}\right]^{-1}$ , which we can solve numerically. Here $M_{j}$ refers to the $j$ -th matrix in the covariance term. Solving for the BLUP gives us Equation (5). To implement each of these, we advocate for simple plug-in estimators for each of the corresponding quantities, most of which can be readily derived through ANOVA-style calculations. Unless otherwise stated, we use

[TABLE]

Appendix B Proofs of Result 1

Proof of Result 1.

Consider an arbitrary logistic regression, explaining $A$ with respect so some covariate $T$ . We will assume, WLOG, that $T$ is univariate to suppress vector notation, though the same argument holds for vector-valued covariates. Assume that the model is fit with a sample of $(A_{1},t_{1}),\ldots,(A_{n},t_{n})$ . Computing a maximum likelihood estimate for $P(A=1|T)$ results in solving

[TABLE]

and

[TABLE]

equal to [math]. Solving these simultaneously results in

[TABLE]

and

[TABLE]

The above expressions simplify to:

[TABLE]

This of course still holds for $T=\widehat{X}$ , since the exact form of $T$ did not factor into the expression. Then, if the weights are defined to be $|A-\widehat{P}(A=1|\widehat{X})|$ , we can see that the ratio of the above two expressions gives us the required form for sample covariate balance. ∎

Appendix C Multistage Simulation Results

To investigate the procedure in the multistage scenario, we consider a variety of related settings formed by varying different aspects of the model. We take $X_{1}\sim N(0,1)$ , with $X^{*}_{11}\sim g_{1}(X_{1})$ and $X^{*}_{12}\sim g_{2}(X_{1})$ , for error models $g_{1},g_{2}$ . We assume that $P(A_{1}=1|X^{*}_{1}=x^{*}_{1})=h_{1}(x^{*}_{1};\alpha_{10},\alpha_{11})$ , with a treatment model $h_{1}$ and parameters $\alpha_{10},\alpha_{11}$ . We also take $X_{2}\sim N(A_{1},1)$ , with $X^{*}_{21}=g_{1}(X_{2})$ and $X^{*}_{22}=g_{2}(X_{2})$ , and $P(A_{2}=1|X^{*}_{2}=x^{*}_{2})=h_{2}(x^{*}_{2};\alpha_{20},\alpha_{21})$ . The outcome is given by $Y=f(X_{1})+(A_{1}^{\text{opt}}-A_{1})\left(1+\psi_{11}X_{1}\right)+(A_{2}^{\text{opt}}-A_{2})\left(1+\psi_{21}X_{2}\right)+\epsilon$ , with $\epsilon\sim N(0,1)$ , where $f(X_{1})$ is the treatment-free model. We consider five scenarios by altering the above parameters.

Considers 10 combinations of $(\alpha_{10},\alpha_{20})$ , values taken from $\{-2,-1,0,1,2\}$ , holding the treatment-free model as linear, both treatment models as linear, the error models as classical additive with $N(0,0.25)$ distribution, $\psi_{11}=\psi_{21}=1$ . 2. 2.

Considers 10 combinations of $(\psi_{11},\psi_{21})$ , values taken from $\{-1,-0.1,0,0.1,1\}$ , holding $\alpha_{10}=\alpha_{20}=0$ , the treatment-free model as linear, both treatment models as linear, the error models as classical additive with $N(0,0.25)$ distribution. 3. 3.

Considers 5 scenarios for various forms of the treatment-free model, taking $f(X_{1})=X_{1}$ (linear), $f(X_{1})=X_{1}+X_{1}^{2}$ (quadratic), $f(X_{1})=X_{1}+X_{2}^{2}-X_{1}^{3}$ (cubic), $f(X_{1})=\exp(X_{1})-X_{1}^{3}$ (exponential), or $\exp(X_{1})I(X_{1}>=-0.5)$ (complex). We hold both treatment models to be linear, $\alpha_{10}=\alpha_{20}=0$ , $\psi_{11}=\psi_{21}=1$ , and the error the error models as classical additive with $N(0,0.25)$ distribution. 4. 4.

Considers 10 scenarios where the treatment models are taken to be one of $h_{j}(x^{*}_{j})=\alpha_{j0}+\alpha_{j1}x^{*}_{j}$ (linear), $h_{j}(x^{*}_{j})=\alpha_{j0}+\alpha_{j1}x^{*}_{j}+(x^{*}_{j})^{2}$ (quadratic), $h_{j}(x^{*}_{j})=\alpha_{j0}+\alpha_{j1}x^{*}_{j}+\exp(x^{*}_{j})$ (exponential), and $h_{j}(x^{*}_{j})=\alpha_{j0}+\alpha_{j1}x^{*}_{j}+(x^{*}_{j})^{2}+\exp(x^{*}_{j})$ (mixed). We hold the treatment-free model to be linear, $\alpha_{10}=\alpha_{20}=0$ , $\psi_{11}=\psi_{21}=1$ , and the error the error models as classical additive with $N(0,0.25)$ distribution. 5. 5.

Considers 10 scenarios for various error models, taking $g_{j}(X_{l})=X_{l}+N(0,0.25)$ (normal), $g_{j}(X_{l})=X_{l}+t_{10}$ (approximately normal), $g_{j}(X_{l})=X_{l}\cdot\text{Gamma}(1,1)$ (gamma), or $g_{j}(X_{l})=X_{l}\cdot\text{Unif}(0.5,1.5)$ (uniform). We hold the treatment-free model to be linear, both treatment models to be linear, $\alpha_{10}=\alpha_{20}=0$ , and $\psi_{11}=\psi_{21}=1$ .

All analyses are conducted where $(X^{*}_{1},X^{*}_{2})$ is taken to be $(X^{*}_{11},X^{*}_{21}),(\overline{X^{*}_{1}},\overline{X^{*}_{2}}),(X^{*}_{11},\overline{X^{*}_{2}}),(\overline{X^{*}_{1}},X^{*}_{21})$ (that is treatment either depends on the first naive proxy, or on the mean of the two proxies). We take $n=10000$ and repeat each scenario $1000$ times. The results for a corrected analysis and a naive analysis are included in Tables 3-7.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Margaret A. Hamburg and Francis S. Collins. The path to personalized medicine. New England Journal of Medicine , 363(4):301–304, 2010. PMID: 20551152.
2[2] Christos Katsios and Dimitrios H Roukos. Individual genomes and personalized medicine: life diversity and complexity. Personalized Medicine , 7(4):347–350, 2010. PMID: 29788639.
3[3] Keiran S.M. Smalley and Vernon K. Sondak. Melanoma - an unlikely poster child for personalized cancer therapy. New England Journal of Medicine , 363(9):876–878, 2010. PMID: 20818849.
4[4] Bibhas Chakraborty and Erica E.M. Moodie. Statistical Methods for Dynamic Treatment Regimes . Springer New York, 2013.
5[5] Michael P. Wallace and Erica E.M. Moodie. Personalizing medicine: a review of adaptive treatment strategies. Pharmacoepidemiology and Drug Safety , 23(6):580–585, apr 2014.
6[6] Grace Y Yi. Statistical Analysis with Measurement Error or Misclassification . Springer Series in Statistics. Springer New York, New York, NY, 2017.
7[7] Shawn Bauldry, Kenneth A. Bollen, and Linda S. Adair. Evaluating measurement error in readings of blood pressure for adolescents and young adults. Blood Pressure , 24(2):96–102, dec 2014.
8[8] Bernard Rosner and Rebecca Gore. Measurement Error Correction in Nutritional Epidemiology based on Individual Foods, with Application to the Relation of Diet to Breast Cancer. American Journal of Epidemiology , 154(9):827–835, 11 2001.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Measurement error and precision medicine: error-prone tailoring covariates in dynamic treatment regimes.

Abstract

1 Introduction

2 Methodology

2.1 DTRs and dWOLS

2.2 Measurement Error and Regression Calibration

3 Measurement Error in Dynamic Treatment Regimes

3.1 Blip Parameter Estimation

Result 1**.**

3.2 Multistage DTRs with Error

3.3 Confidence Intervals and Standard Errors

3.4 Future Treatments

4 Simulation Studies

4.1 Parameter Estimation

4.2 Coverage Probabilities

4.3 Future Treatment Predictions

5 STAR*D study

5.1 Model Fitting and Comparison

6 Discussion

Acknowledgments

Data Availability Statement

Appendix A Regression Calibration: Details

Appendix B Proofs of Result 1

Proof of Result 1.

Appendix C Multistage Simulation Results

Result 1.