Estimating Individualized Treatment Regimes from Crossover Designs

Crystal T. Nguyen (1); Daniel J. Luckett (1); Anna R. Kahkoska (2),; Grace E. Shearrer (2); Donna Spruijt-Metz (3); Jaimie N. Davis (4); and; Michael R. Kosorok (1) ((1) Department of Biostatistics; University of North; Carolina; Chapel Hill; North Carolina; U.S.A.; (2) Department of Nutrition,; University of North Carolina; Chapel Hill; U.S.A.; (3) Center of Economic and; Social Research; University of Southern California; Los Angeles; California,; U.S.A.; (4) Department of Nutrition; University of Texas; Austin; Texas,; U.S.A.)

arXiv:1902.05499·stat.AP·February 15, 2019

Estimating Individualized Treatment Regimes from Crossover Designs

Crystal T. Nguyen (1), Daniel J. Luckett (1), Anna R. Kahkoska (2),, Grace E. Shearrer (2), Donna Spruijt-Metz (3), Jaimie N. Davis (4), and, Michael R. Kosorok (1) ((1) Department of Biostatistics, University of North, Carolina, Chapel Hill, North Carolina, U.S.A.

PDF

TL;DR

This paper introduces a novel method for estimating optimal individualized treatment regimes using crossover study data, leveraging the design's ability to observe responses to multiple treatments per patient, and demonstrates its advantages over traditional parallel study methods.

Contribution

The paper develops a new approach for ITR estimation from crossover trial data, incorporating response differences to improve accuracy and consistency in treatment decision predictions.

Findings

01

The method is Fisher and globally consistent.

02

Numerical experiments show improved performance over standard methods.

03

Application to a feeding trial illustrates practical benefits.

Abstract

The field of precision medicine aims to tailor treatment based on patient-specific factors in a reproducible way. To this end, estimating an optimal individualized treatment regime (ITR) that recommends treatment decisions based on patient characteristics to maximize the mean of a pre-specified outcome is of particular interest. Several methods have been proposed for estimating an optimal ITR from clinical trial data in the parallel group setting where each subject is randomized to a single intervention. However, little work has been done in the area of estimating the optimal ITR from crossover study designs. Such designs naturally lend themselves to precision medicine, because they allow for observing the response to multiple treatments for each patient. In this paper, we introduce a method for estimating the optimal ITR using data from a 2x2 crossover study with or without carryover…

Tables2

Table 1. Table 1: The interactive and carryover effects for the five simulation scenarios.

Scenario	$c (𝑿)$	$δ_{- 1} (𝑿)$	$δ_{1} (𝑿)$
1	$1.12 (0.3 - X_{1} - X_{2})$	0	0
2	$1.15 (X_{1} - 1.25 X_{2}^{2})$	0	0
3	$1.12 (0.3 - X_{1} - X_{2})$	$\| \frac{μ (𝑿) + c (𝑿)}{4} \|$	$\| \frac{μ (𝑿) - c (𝑿)}{2} \|$
4	$1.15 (X_{1} - 1.25 X_{2}^{2})$	$0.4 X_{1}^{2} + 0.3 X_{2}$	$1 - 2 X_{1} - X_{2}^{2}$

Table 2. Table 2: Mean (sd) 5-fold cross-validated estimated values for feeding trial data compared with the observed value from period 1.

	Outcome
	Fullness		Hunger
Ridge	$3.00$	$(4.53)$	$5.60$	$(8.15)$
OWL	$3.07$	$(3.88)$	$5.45$	$(7.42)$
GOWL	$3.85$	$(4.97)$	$8.29$	$(7.93)$
Crossover GOWL	$6.39$	$(3.57)$	$10.50$	$(8.36)$
Observed	$0.96$		$4.66$

Equations78

Y = μ (X) + A c (X) + ϵ,

Y = μ (X) + A c (X) + ϵ,

V (D) = E [\frac{Y 1 { A = D ( X )}}{P ( A ∣ X )}],

V (D) = E [\frac{Y 1 { A = D ( X )}}{P ( A ∣ X )}],

D_{0} = D \in D ar g min E [\frac{Y 1 { A \neq = D ( X )}}{P ( A ∣ X )}] .

D_{0} = D \in D ar g min E [\frac{Y 1 { A \neq = D ( X )}}{P ( A ∣ X )}] .

Y_{k} = μ (X) + A_{k} c (X) + δ_{A_{1}} (X) 1 {k = 2} + ϵ_{k}

Y_{k} = μ (X) + A_{k} c (X) + δ_{A_{1}} (X) 1 {k = 2} + ϵ_{k}

E [\frac{R}{P ( A _{1} ∣ X )} 1 {A_{1} = D (X)}],

E [\frac{R}{P ( A _{1} ∣ X )} 1 {A_{1} = D (X)}],

D_{0} = D \in D ar g min E [\frac{R}{P ( A _{1} ∣ X )} 1 {A_{1} \neq = D (X)}] .

D_{0} = D \in D ar g min E [\frac{R}{P ( A _{1} ∣ X )} 1 {A_{1} \neq = D (X)}] .

f \in F ar g min \frac{1}{n} i = 1 \sum n \frac{∣ R _{i} ∣}{P ( A _{i, 1} ∣ X _{i} )} ψ {R_{i}, A_{i, 1} f (X_{i})} + λ_{n} ∣∣ f ∣ ∣^{2},

f \in F ar g min \frac{1}{n} i = 1 \sum n \frac{∣ R _{i} ∣}{P ( A _{i, 1} ∣ X _{i} )} ψ {R_{i}, A_{i, 1} f (X_{i})} + λ_{n} ∣∣ f ∣ ∣^{2},

f_{n}^{*} = f \in F ar g min \frac{1}{n} i = 1 \sum n \frac{∣ R _{i} ∣}{P ( A _{i, 1} ∣ X _{i} )} ψ {R_{i}, A_{i, 1} f (X_{i})} + λ_{n} ∣∣ f ∣ ∣^{2},

f_{n}^{*} = f \in F ar g min \frac{1}{n} i = 1 \sum n \frac{∣ R _{i} ∣}{P ( A _{i, 1} ∣ X _{i} )} ψ {R_{i}, A_{i, 1} f (X_{i})} + λ_{n} ∣∣ f ∣ ∣^{2},

D^{*} = D \in D ar g max E [\frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}] .

D^{*} = D \in D ar g max E [\frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}] .

R_{ψ} (f) = E [\frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}] .

R_{ψ} (f) = E [\frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}] .

n \to \infty lim R (f_{n}^{*}) \to_{P} R (f_{0}) .

n \to \infty lim R (f_{n}^{*}) \to_{P} R (f_{0}) .

V (D) = \frac{P _{n_{test}} [ Y 1 { A = D ( X )} / P ( A _{1} ∣ X )]}{P _{n_{test}} [ 1 { A = D ( X } / P ( A _{1} ∣ X )]} .

V (D) = \frac{P _{n_{test}} [ Y 1 { A = D ( X )} / P ( A _{1} ∣ X )]}{P _{n_{test}} [ 1 { A = D ( X } / P ( A _{1} ∣ X )]} .

P_{n_{test}} [δ_{A_{i, 1}} (X_{i}) - δ_{A_{i, 1}} (X_{i})]^{2} .

P_{n_{test}} [δ_{A_{i, 1}} (X_{i}) - δ_{A_{i, 1}} (X_{i})]^{2} .

Y_{k} = μ (X) + A_{k} c (X) + δ_{A_{1}} (X) 1 {k = 2} + ϵ_{k},

Y_{k} = μ (X) + A_{k} c (X) + δ_{A_{1}} (X) 1 {k = 2} + ϵ_{k},

E [Y^{*} {D_{0} (X)} - Y^{*} {- D_{0} (X)}]

E [Y^{*} {D_{0} (X)} - Y^{*} {- D_{0} (X)}]

= 2∣ c (X) ∣

\geq E [Y^{*} {D (X)} - Y^{*} {- D (X)}],

D_{0}

D_{0}

= D \in D ar g max E [\frac{1 { A _{1} = D ( X )}}{P ( A _{1} ∣ X )} [Y_{1} - Y_{2} + δ_{A_{1}} (X)]

+ \frac{1 { A _{1} \neq = D ( X )}}{P ( A _{1} ∣ X )} [Y_{2} - δ_{A_{1}} (X) - Y_{1}]]

= D \in D ar g min E [\frac{Y _{1} - Y _{2} + δ _{A_{1}} ( X )}{P ( A _{1} ∣ X )} 1 {A_{1} \neq = D (X)}]

= D \in D ar g min E [\frac{R}{P ( A _{1} ∣ X )} 1 {A_{1} \neq = D (X)}],

\mathcal{R}_{\psi}(f,\boldsymbol{x})=E\left[\frac{|{R}|}{P(A_{1}|\boldsymbol{X})}\psi\{{R},A_{1}f(\boldsymbol{X})\}\Big{|}\boldsymbol{X}=\boldsymbol{x}\right],

\mathcal{R}_{\psi}(f,\boldsymbol{x})=E\left[\frac{|{R}|}{P(A_{1}|\boldsymbol{X})}\psi\{{R},A_{1}f(\boldsymbol{X})\}\Big{|}\boldsymbol{X}=\boldsymbol{x}\right],

R_{ψ} (f, x)

R_{ψ} (f, x)

\displaystyle=E\left[R^{+}\max\{1-f(\boldsymbol{X}),0\}-R^{-}\max\{1+f(\boldsymbol{X}),0\}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]

\displaystyle\hskip 21.33955pt+E\left[R^{+}\max\{1+f(\boldsymbol{X}),0\}-R^{-}\max\{1-f(\boldsymbol{X}),0\}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right].

[1-f(\boldsymbol{x})]\left\{E\left[R^{+}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]-E\left[R^{-}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]\right\},

[1-f(\boldsymbol{x})]\left\{E\left[R^{+}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]-E\left[R^{-}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]\right\},

R_{ψ} (f, x)

R_{ψ} (f, x)

\displaystyle\hskip 21.33955ptf(\boldsymbol{X})\left\{-E\left[R^{+}-R^{-}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]+E\left[R^{+}+R^{-}\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]\right\}

\displaystyle=E\left[|R|\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]+E\left[|R|\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]

\displaystyle\hskip 21.33955pt+f(\boldsymbol{X})\left\{E\left[R\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]-E\left[R\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]\right\}.

L_{ψ} (f) = \frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}

L_{ψ} (f) = \frac{∣ R ∣}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)}

L_{ψ} (f) = \frac{R}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)} .

L_{ψ} (f) = \frac{R}{P ( A _{1} ∣ X )} ψ {R, A_{1} f (X)} .

P_{n} L_{ψ} (f_{n}^{*}) + λ_{n} ∣∣ f_{n}^{*} ∣ ∣^{2} \leq P_{n} L_{ψ} (f) + λ_{n} ∣∣ f ∣ ∣^{2},

P_{n} L_{ψ} (f_{n}^{*}) + λ_{n} ∣∣ f_{n}^{*} ∣ ∣^{2} \leq P_{n} L_{ψ} (f) + λ_{n} ∣∣ f ∣ ∣^{2},

P_{n} L_{ψ} (f_{n}^{*}) + λ_{n} ∣∣ f_{n}^{*} ∣ ∣^{2}

P_{n} L_{ψ} (f_{n}^{*}) + λ_{n} ∣∣ f_{n}^{*} ∣ ∣^{2}

= P_{n} {\frac{∣ R _{i} ∣}{P ( A _{i, 1} ∣ X _{i} )}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Estimating Individualized Treatment Regimes from Crossover Designs

Crystal T. Nguyen1, Daniel J. Luckett1, Anna R. Kahkoska2,

Grace E. Shearrer2, Donna Spruijt-Metz3, Jaimie N. Davis4,

and Michael R. Kosorok1

1Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, U.S.A.

2Department of Nutrition, University of North Carolina, Chapel Hill, North Carolina, U.S.A.

3Center of Economic and Social Research, University of Southern California, Los Angeles, California, U.S.A

4Department of Nutrition, University of Texas at Austin, Austin, Texas, U.S.A.

Abstract

The field of precision medicine aims to tailor treatment based on patient-specific factors in a reproducible way. To this end, estimating an optimal individualized treatment regime (ITR) that recommends treatment decisions based on patient characteristics to maximize the mean of a pre-specified outcome is of particular interest. Several methods have been proposed for estimating an optimal ITR from clinical trial data in the parallel group setting where each subject is randomized to a single intervention. However, little work has been done in the area of estimating the optimal ITR from crossover study designs. Such designs naturally lend themselves to precision medicine, because they allow for observing the response to multiple treatments for each patient. In this paper, we introduce a method for estimating the optimal ITR using data from a $2\times 2$ crossover study with or without carryover effects. The proposed method is similar to policy search methods such as outcome weighted learning (OWL); however, we take advantage of the crossover design by using the difference in responses under each treatment as the observed reward. We establish Fisher and global consistency, present numerical experiments, and analyze data from a feeding trial to demonstrate the improved performance of the proposed method compared to standard methods for a parallel study design.

Keywords: Crossover design; Individualized treatment regime; Machine learning; Outcome weighted learning; Personalized medicine; Precision medicine

Introduction

Personalized medicine is the practice of tailoring treatment to account for patient heterogeneity (Chakraborty and Murphy, 2014). Physicians and other health care providers have practiced personalized medicine by adjusting doses or prescriptions based on a patient’s medical history or demographics for centuries (Ashley, 2015; Zhao and Zeng, 2013). Precision medicine is an emerging field that aims to support personalized medicine decisions with reproducible research (Collins and Varmus, 2015). Such research is imperative, particularly when diseases are expressed with great heterogeneity across patients. A topic of interest in precision medicine is the individualized treatment regime (ITR): a set of decision rules for one or more decision time points that can be used to assign patients to treatment tailored by their patient-specifict factors (Lavori and Dawson, 2014; Moodie et al., 2007; Petersen et al., 2007). One objective in precision medicine is to estimate the optimal ITR, or the ITR that maximizes the mean of some desirable outcome (Kosorok and Moodie, 2015; Laber et al., 2014). Crossover clinical trials are uniquely suited to precision medicine, because they allow for observing responses to multiple treatments for each patient. This paper introduces a method to estimate optimal ITRs using data from a crossover study by extending generalized outcome weighted learning (GOWL) (Chen et al., 2018) to deal with correlated outcomes.

In a crossover study, patients are randomized to a sequence of treatments rather than a single treatment. Thus, multiple outcomes are observed, one per subject from each treatment period, and each subject acts as his or her own control for reduced between-subject variability (Machin and Fayers, 2010; Turner, 2010; Wellek and Blettner, 2012). Therefore, crossover designs naturally lend themselves to precision medicine; estimating the optimal ITR from a crossover design can utilize all counterfactual outcomes. In contrast, estimating the optimal ITR from traditional parallel group designs, where patients are assigned to a single treatment, can only utilize the subset of counterfactual outcomes that are observed.

There have been many developments in machine learning methods for answering precision medicine questions from parallel study designs. For example, Qian and Murphy (2011) indirectly estimate the decision rule using L1 penalized least squares; Zhang et al. (2012a) maximize a doubly robust augmented inverse probability weighted estimator for the population mean outcome; Athey and Wager (2017) maximize a doubly robust score that may take into account instrumental variables; Kallus (2018) employs a weighting algorithm similar to inverse probability weighting but minimize the worst case mean square error; Laber and Zhao (2015) propose the use of decision trees, which prove to be both flexible and easily interpretable; Zhao et al. (2012), Zhang et al. (2012b), Zhou et al. (2017), and Chen et al. (2018) directly estimate the decision rule by viewing the problem from a weighted classification standpoint.

However, little work has been done to develop precision medicine methods that handle correlated observations in the single-stage decision setting such as those that arise from crossover designs. Kulasekera and Siriwardhana (2018) propose a weighted ranking algorithm to estimate a decision rule that maximizes either the expected outcome or the probability of selecting the best treatment, but they assume that there are no carryover effects present. Because the intended effect of the washout period can be difficult to achieve in practice (Wellek and Blettner, 2012), it is imperative that methods for crossover designs can be applied when carryover effects are present. In this paper, we show that the difference in response to two treatments from a $2\times 2$ crossover trial can be used as the reward in the generalized outcome weighted learning (GOWL) objective function to estimate an optimal ITR. We introduce a plug-in estimator that can be used with the proposed method to account for carryover effects. Additionally, we show that using a crossover design with the proposed method results in improvements in misclassification rate and estimated value when compared to standard methods for a parallel design at the same sample size.

As a clinical example, consider nutritional recommendations surrounding the intake of dietary fiber for the purpose of weight loss. Although increased fiber is recommended across the population for a myriad of health benefits (Anderson et al., 1994; Anderson et al., 2009; Marlett et al., 2002; US Department of Agriculture, 2010), evidence of the impact of the consumption of dietary fiber for improved satiety and reduction in body weight is mixed (Halliday et al., 2018; Slavin, 2005). Heterogeneity in response to dietary fiber may be leveraged to develop targeted fiber interventions to promote feelings of satiety. We use data from a crossover study in which Hispanic and African American adolescents who are overweight and obese were fed breakfast and lunch under a typical western high sugar diet and a high fiber diet. From these data, we estimate a decision rule with which clinical care providers can input patient characteristics, including demographics and clinical measures, and receive a recommendation to maximize the change in measures of perceived satiety from before breakfast to after lunch. This type of analysis could be useful in identifying a subgroup of at-risk adolescents for which targeting specific dietary recommendations is expected to lead to an increase in patient-reported satiety, helping to decrease caloric intake in a population with great clinical need for effective weight loss strategies.

The rest of this paper is organized as follows. In Section 2, we review outcome weighted learning (OWL) (Zhao et al., 2012) and present the proposed method for estimating an optimal ITR from a crossover study regardless of the presence of carryover effects. Section 3 establishes Fisher and global consistency. Section 4 demonstrates the performance of the proposed method in simulation studies, with results on misclassification rate and estimated value. Section 5 reports on the analysis of data from a feeding trial with overweight and obese Latino and African American adolescents, and we conclude with a discussion in Section 6.

Methodology

In this section, we provide a brief overview of existing methods for estimating the optimal ITR using weighted classification. We then provide the justification and means to implement our proposed method, which we will from here refer to as “crossover GOWL.”

Existing Methods

Consider a parallel, two-arm clinical trial in which we have i.i.d. observations $(\boldsymbol{X}_{i},A_{i},Y_{i})$ for $i=1,\ldots,n$ , where $A\in\mathcal{A}=\{-1,1\}$ is binary treatment assignment, $\boldsymbol{X}\in\mathcal{X}$ is a $p$ -dimensional vector of covariates, and $Y\in\mathbb{R}$ is a reward, bounded by $M_{0}<\infty$ , for which greater values are desired. Assume that $Y$ is of the form

[TABLE]

where $\mu(\boldsymbol{X})$ is the main effect of the covariates, $c(\boldsymbol{X})$ is the treatment-covariate interaction, and $\epsilon$ has mean 0 and variance $\sigma^{2}_{\epsilon}$ . Denote $Y^{*}(a)$ as the counterfactual outcome under treatment $a$ . We then make three causal assumptions (Rubin, 1978) to connect the counterfactual outcomes to the observed data: $P(A=a|\boldsymbol{X})>0$ with probability 1, $\{Y^{*}(1),Y^{*}(-1)\}\perp A|\boldsymbol{X}$ , and $Y=Y^{*}(a)$ . These are known as positivity, conditional exchangeability, and consistency, respectively.

An ITR, $D$ , comes from the set of all functions $\mathcal{D}$ that map the covariate space $\mathcal{X}$ to the treatment space $\mathcal{A}$ . Our objective is to estimate the optimal ITR, denoted $D_{0}$ , which maximizes the value function (Qian and Murphy, 2011),

[TABLE]

where $P(A|\boldsymbol{X})=\mathrm{Pr}(A=a|\boldsymbol{X}=\boldsymbol{x})$ is the propensity score for treatment. Equivalently, $D_{0}$ may be defined as

[TABLE]

Zhao et al. (2012) propose OWL to solve this problem: each misclassified observation is weighted by its observed outcome, $Y$ , and the hinge loss is used to bring the problem into the support vector machine framework (Cortes and Vapnik, 1995). Unfortunately, OWL assumes $Y$ is nonnegative; when negative values are observed, OWL shifts all outcomes to be nonnegative, since (2) is invariant to such a transformation. The objective function in OWL, however, does not have this property. Therefore, the estimated decision function in OWL depends on the chosen shift in the outcomes. Chen et al. (2018) propose GOWL, an extension of OWL, which handles negative rewards by modifying the hinge loss to be piecewise and weighting the misclassified observations by $|Y|$ . With GOWL, there is no need to shift rewards.

However, neither Zhao et al. (2012) nor Chen et al. (2018) considered correlated outcomes, such as those that arise from a crossover design setting. We now introduce crossover GOWL, a method that combines the observed treatment response difference with GOWL to estimate the optimal ITR from $2\times 2$ crossover data.

Crossover Generalized Outcome Weighted Learning

In a crossover design, patients are randomly assigned to a sequence of treatments rather than a single treatment. For the $2\times 2$ design, patients are randomized to receive either the $(-1,1)$ or the $(1,-1)$ sequence, with some prespecified washout period between treatments. The washout period is a break between treatments which serves to remove any carryover effects, or residual effects remaining from a previous treatment at the start of the next treatment. Keeping most of the notation from before, we now introduce sequential treatments and outcomes $A_{k}$ and $Y_{k}$ for periods $k=1,2$ , respectively, i.e., $Y_{k}$ is the observed outcome after receiving treatment $A_{k}$ in period $k$ . Furthermore, we assume the model

[TABLE]

where $\boldsymbol{\epsilon}=(\epsilon_{1},\epsilon_{2})^{\top}$ has mean $\boldsymbol{0}$ and a positive definite covariance matrix, $\Sigma_{\epsilon}$ , and $\delta_{A_{1}}(\boldsymbol{X})$ is the carryover effect which may depend on $A_{1}$ and $\boldsymbol{X}$ . Note that in a $2\times 2$ crossover study, the period effects, or temporal effects, are nonseparable from the carryover effects (Fleiss, 1989), so $\delta_{A_{1}}(\boldsymbol{X})$ encompasses both period and carryover effects.

Let $R=Y_{1}-[Y_{2}-\delta_{A_{1}}(\boldsymbol{X})]$ . Given the observed data $(\boldsymbol{X},A_{1},Y)$ , we propose the following as a substitute for the value function to be maximized:

[TABLE]

where $P(A_{1}|\boldsymbol{X})$ is the probability of being assigned to the sequence $(A_{1},-A_{1})$ conditional on $\boldsymbol{X}$ . Under Lemma 2.1, maximizing (3) is equivalent to maximizing (1); the proof is left to Appendix C.

Lemma 2.1.

Under the given assumptions,

[TABLE]

Following (3), we use an approach similar to GOWL but weight misclassified observations by the treatment response difference, and we minimize the objective function (4) for $f$ in $\mathcal{F}$ , a class of functions, e.g., a reproducing kernel Hilbert space. Let $\psi(u,v)=\max\{1-\mathrm{sign}(u)v,0\}$ , $\lambda_{n}$ be a tuning parameter, and $||f||$ be the $L_{2}$ norm of $f$ . For details on solving the minimization problem in (4), we defer to Chen et al. (2018) and Kimeldorf and Wahba (1970).

[TABLE]

In practice, the true value of $\delta_{A_{1}}(\boldsymbol{X})$ is unknown. In traditional analyses, we are concerned with testing the null hypothesis that $\delta_{-1}(\boldsymbol{X})=\delta_{1}(\boldsymbol{X})$ . Here, we are instead interested in whether or not either treatment has a nonzero carryover effect. Investigators may determine whether carryover effects are present any number of ways, including two-sample $t$ -tests for the null hypotheses $H_{0,1}{:}\ E[\delta_{1}(\boldsymbol{X})]=0$ and $H_{0,-1}{:}\ E[\delta_{-1}(\boldsymbol{X})]=0$ by comparing mean responses to each treatment at each time point. An estimator for $\delta_{A_{1}}(\boldsymbol{X)}$ , denoted $\widehat{\delta}_{A_{1}}(\boldsymbol{X})$ , can be computed using Algorithm 1.

In short, one model is fit to predict what would have been observed in period 2 in the absence of carryover effects, and another model is fit to predict the residual from the first model. While any regression technique may be used here, we use reinforcement learning trees (RLT) in our implementation. RLT is a nonparametric tree-based machine learning method that considers future splits or branches in the model when determining the best split at any node (Zhu et al., 2015).

We can now correct the observed reward with the estimated carryover effects. Letting $\widehat{R}=Y_{1}-\left[Y_{2}-\widehat{\delta}_{A_{i,1}}(\boldsymbol{X})\right],$ the estimated decision function is

[TABLE]

and our proposed estimator of the optimal ITR is $\widehat{D}^{*}(\boldsymbol{X})=\mathrm{sign}\left\{\widehat{f}^{*}_{n}(\boldsymbol{X})\right\}$ , where

[TABLE]

Theoretical Results

In this section, we establish both Fisher and global consistency. First, define the risk under 0-1 loss to be $\mathcal{R}(f)=E[Y\ 1\{A\neq\mathrm{sign}[f(\boldsymbol{X})]\}/P(A|\boldsymbol{X})].$ The risk under the modified loss function with the reward defined as the treatment response difference is then

[TABLE]

Let $f^{*}(\boldsymbol{X})=\operatorname*{\arg\!\min}_{f\in\mathcal{F}}\mathcal{R}_{\psi}(f)$ , so that the corresponding ITR under the modified loss for the treatment response difference is ${D}^{*}(\boldsymbol{X})=\mathrm{sign}\{f^{*}(\boldsymbol{X})\}$ . Under Theorem 1, Fisher consistency for $D^{*}(\boldsymbol{X})$ is derived.

Theorem 3.1.

Under the given assumptions, ${D}^{*}(\boldsymbol{X})={D}_{0}(\boldsymbol{X})$ .

Consider that $\mathcal{F}=\{k(\cdot,\boldsymbol{x}):\boldsymbol{x}\in\mathcal{X}\}$ for some kernel function $k$ , and let $\mkern 1.5mu\overline{\mkern-1.5mu\mathcal{F}\mkern-1.5mu}\mkern 1.5mu$ be the closure of $\mathcal{F}$ . Define $f_{0}$ to be the minimizer over all functions $f$ for $\mathcal{R}(f)$ , and define $f_{0}^{*}$ to be the same for $\mathcal{R}_{\psi}(f)$ .

Theorem 3.2.

Let $\lambda_{n}>0$ be a sequence such that $\lambda_{n}\to 0$ and $\lambda_{n}n\to\infty$ with probability going to 1 as $n\to\infty$ . Assume $\exists\ M_{1}<\infty$ such that $P\left(|\widehat{\delta}_{A_{1}}(\boldsymbol{X})|<M_{1}\right)\to 1$ as $n\to\infty$ and $|\delta_{A_{1}}(\boldsymbol{X})|<M_{1}$ almost surely. If $\mathbb{P}\left[1\{\mathrm{sign}[\widehat{R}]\neq\mathrm{sign}[R]\}\right]=o_{P}(\lambda_{n})$ , then, for any distribution $P$ of $(\boldsymbol{X},A_{1},\boldsymbol{Y}),$ $\lim_{n\to\infty}\mathcal{R}_{\psi}(\widehat{f}^{*}_{n})\to_{P}\mathcal{R}_{\psi}\left(f^{*}\right)$ . Furthermore, if $f^{*}_{0}\in\mkern 1.5mu\overline{\mkern-1.5mu\mathcal{F}\mkern-1.5mu}\mkern 1.5mu,$

[TABLE]

Derivation of Theorems 1 and 2 may be found in Appendix C.

Simulation Studies

To illustrate the benefits of using crossover GOWL, we present simulation studies with comparisons to standard methods used in parallel group clinical trials. Simulated data sets were generated as follows. The covariates, $\boldsymbol{X}_{1},\ldots,\boldsymbol{X}_{50}$ , are i.i.d. variables drawn from a $U(-1,1)$ distribution. Subjects were randomized to treatment $-1$ or $1$ for the parallel design or to sequence $(-1,1)$ or $(1,-1)$ for the crossover design with equal probability. The response for the parallel design, $Y,$ is normally distributed with a mean of $\mu(\boldsymbol{X})+c(\boldsymbol{X})A$ and a variance of 1. For the crossover design, responses were simulated per the model $Y_{k}=\mu(\boldsymbol{X})+A_{k}c(\boldsymbol{X})+\delta_{A_{1}}(\boldsymbol{X})\ 1\{k=2\}+\epsilon_{k}$ , for $k=1,2$ , where $\boldsymbol{\epsilon}$ was drawn from a multivariate normal distribution with mean $\boldsymbol{0},\ \mathrm{Var}[\epsilon_{1}]=\mathrm{Var}[\epsilon_{2}]=1$ , and $\mathrm{Cov}[\epsilon_{1},\epsilon_{2}]=0.5$ . $\mu(\boldsymbol{X})$ was fixed to be $1+\boldsymbol{X}_{1}+2\boldsymbol{X}_{2}+0.5\boldsymbol{X}_{3}+\boldsymbol{X}_{4}$ for all simulation scenarios. Table 1 describes choices of $c(\boldsymbol{X})$ and $\delta_{A_{1}}(\boldsymbol{X})$ defining four scenarios.

Scenarios 1 and 3 are linear in $\boldsymbol{X}$ , whereas Scenarios 2 and 4 are nonlinear. Note that scenario pairs $(1,3)$ and $(2,4)$ are similar, but Scenarios 3 and 4 include carryover effects. The optimal ITR was estimated via crossover GOWL, using a Gaussian kernel. The penalty parameter, $\lambda_{n},$ and the Gaussian kernel bandwidth parameter, $\sigma_{n},$ were selected using 5-fold cross-validation on the grids $\{0.1,0.5,1,5,10,50,100,500\}/n$ and $(0.1,0.2,\ldots,5.0)$ , respectively. In scenarios where carryover effects are present, RLT (Zhu et al., 2015) was used to fit both models to estimate $\widehat{\delta}_{A_{1}}(\boldsymbol{X})$ using Algorithm 1.

A testing data set of size $n_{\mathrm{test}}=10,000$ was generated similarly with period 1 data only. The misclassification rate, or $\mathbb{P}_{n_{\mathrm{test}}}1\left\{\widehat{{D}}^{*}(\boldsymbol{X})\neq{D}_{0}(\boldsymbol{X})\right\}$ , of the estimated ITR applied to the testing set was calculated, where $\mathbb{P}_{n_{\mathrm{test}}}$ is the empirical mean in the test set. We also calculated the estimated value of the estimated ITR, $\widehat{\mathcal{V}}\left(\widehat{D}^{*}\right)$ (Qian and Murphy, 2011), where

[TABLE]

Note that $P(A_{1}|\boldsymbol{X})=0.5$ is constant here. The estimated value is the average reward observed under the estimated optimal ITR when applied to the testing set. Figure 3 in Appendix A displays the mean square error from estimating the carryover effects with RLT for Scenarios 3 and 4.

Simulations were repeated 1,000 times at training set sample sizes of 30, 75, 150, 300, and 600. Comparisons to various methods in the parallel setting at the same sample size are presented in Figures 1 and 2. These methods include OWL, GOWL, and ridge regression. For OWL and GOWL, a Gaussian kernel was used, and the aforementioned grids for $\lambda_{n}$ and $\sigma_{n}$ are considered in 5-fold cross-validation. For ridge regression, the model includes all covariates and treatment-covariate interactions without any higher order terms or between-covariate interactions. 5-fold cross-validation was used to determine a value for the the ridge penalty parameter, where the same values for $\lambda_{n}$ in the OWL methods are considered. All simulations were performed with R version 3.4.3 (R Core Team, 2017). RLT was implemented with the RLT package, version 3.2.1 (Zhu, 2017), and all OWL methods were implemented with the DynTxRegime package, version 3.2 (Holloway et al., 2018). While the DynTxRegime package does not currently support GOWL, the inputs for OWL can be recoded to implement GOWL. Ridge regression was carried out with the glmnet package (Friedman et al., 2010).

Figure 1 displays the average misclassification rates across all sample sizes, methods, and scenarios. Figure 2 displays the mean square error of the estimated value from the true value, i.e., $\mathbb{P}_{n_{\mathrm{test}}}\left\{\left[\widehat{\mathcal{V}}\left(\widehat{D}^{*}\right)-\widehat{\mathcal{V}}\left(D_{0}\right)\right]^{2}\right\}$ from period 1 data. On average, crossover GOWL yields lower misclassification rates and higher estimated values at smaller sample sizes across all scenarios. Crossover GOWL shows marked improvement in both misclassification and estimated value for small $n$ . When $n$ is large, ridge regression yields competitive results with that from crossover GOWL, but crossover GOWL still appears to have marginal gains. Although GOWL in the parallel setting does not perform as well as OWL in any of the presented scenarios, Chen et al. (2018) discuss scenarios where improvements in misclassification and estimated value are observed when using GOWL as opposed to OWL.

FAME Feeding Trial Data Analysis

We present the application of crossover GOWL to data from the Food, Adolescents, Mood and Exercise (FAME) crossover feeding trial, conducted at the University of Southern California (USC) (O’Reilly et al., 2015). The FAME trial included African American and Latino adolescents who were overweight or obese. African American and Latino adolescents are disproportionately affected by overweight and obesity outcomes compared to their non-Hispanic counterparts (Ogden et al., 2014; O’Reilly et al., 2015; Taveras et al., 2013). Dietary intake is a major modifiable risk factor and represents a key intervention point in improving weight loss (Bleich et al., 2017; Kipping et al., 2008). One promising approach is to modify dietary components to improve satiety to indirectly reduce caloric intake (Anderson et al., 2009). In epidemiologic studies of adults in the US, fiber intake is inversely associated with body weight and body fat (Slavin, 2005), even after adjusting for confounding factors such as dietary fat intake. However, results from intervention studies are mixed: increased dietary fiber intake has been shown to have varied effects on body weight among adults who are overweight or obese, with limited research in pediatric or adolescent populations (Rössner et al., 1987; Ryttig et al., 1989; Slavin, 2005; Thompson et al., 2005; Tucker and Thomas, 2009). Given the heterogeneity in the effects of dietary fiber intake on body weight, it is essential to identify the subgroups of overweight and obese adolescents who may benefit from tailored clinical advice to increase fiber intake. We estimate a decision rule to identify a subgroup of adolescents who are overweight or obese that experiences larger increases in patient-reported satiety from a high fiber diet as opposed to the more common high sugar diet.

This study was conducted at the USC Health Sciences campus in Los Angeles, California from 2008 to 2011. Eighty-six Latino and African American adolescents (ages 14 to 17 years of age) who were overweight or obese (BMI percentile $>85\%$ ) were recruited. Race was self-reported, and subjects were included if all four grandparents were Latino or African American. Subjects were excluded if they had type 2 diabetes, were in a weight loss program within the past 6 months, or used medications that influenced insulin or body composition. Informed written parental consent and participant assent were acquired before all testing procedures. The Institutional Review Board of USC approved all study procedures.

Participants received either a high sugar/low fiber (HSLF) meal plan or a high fiber/low sugar (HFLS) meal plan for breakfast and lunch on two separate visit days. Participants were randomized with equal probability to receive the HSLF/HFLS or HFLS/HSLF sequence with a minimum 2 week washout period between visits. The meals were isocaloric and matched for macronutrients except sugar and fiber content. Participants initially attended a baseline visit at the Clinical Trials Unit at the USC University Hospital where insulin sensitivity, Tanner stage via examination by a medical professional, BMI percentile for age, sex, ethnicity, waist circumference, and hemoglobin A1c (HbA1c) were collected. Insulin sensitivity was assessed via a frequently sampled intravenous glucose tolerance test (FSIVGTT) and calculated using the minimal model (Bergman et al., 1979; Yang et al., 1987). At the subsequent test meal visits, participants received either a HSLF or HFLS breakfast after a 10 hour overnight fast. At noon, the participants received the same meal condition for lunch. Participants rated their hunger and fullness via a 100 mm-visual analog scale (VAS) prior to breakfast and 45 minutes after the start of lunch (300 minutes after breakfast). Participants were provided with age appropriate activities between meals (e.g., video games, crafts, books, etc.).

The satiety outcomes are formally defined as the negative change in hunger, since lower values of hunger are desired, and the observed change in fullness between 8:00 AM and 1:00 PM (before breakfast and after lunch). Due to the nature of the outcomes, the required 10 hour overnight fast, and the implemented minimum 2 week washout period, we assumed no carryover effects were present. Of the 86 subjects who completed the study, 20 were removed for missing outcomes, and 1 was removed for missing insulin sensitivity. Participants that did not return within 5 weeks were also removed $(n=54)$ . We compared crossover GOWL with OWL, GOWL, and ridge regression using data from period 1 only. Methods were implemented as described in Section 4. 5-fold cross-validated value estimates were obtained, but rather than using Equation (6) which uses only period 1 data, the value for each observation $i=1,\ldots,n_{m}$ in the $m$ th fold’s testing set was computed as $Y_{i,1}1\left\{A_{1}=\widehat{D}_{0}(\boldsymbol{X})\right\}+Y_{i,2}1\left\{A_{2}=\widehat{D}_{0}(\boldsymbol{X})\right\}$ where $n_{m}$ is the size of the $m$ th fold for $m=1,\ldots,5.$ Although OWL, GOWL, and ridge regression were trained on period 1 data, data from both periods were used to improve accuracy in the value estimate because the testing set size for each fold is quite small.

Resulting estimated values, averaged across folds, are presented in Table 2 along with the mean outcome observed from period 1. For both outcomes, all methods show improvement in the estimated value in comparison to randomization, but crossover GOWL yields the highest improvement. For self-reported fullness, crossover GOWL also yields the smallest standard deviation. When training crossover GOWL on the full dataset, 92% (51%) of participants are assigned to the HFLS to maximize the change in fullness (hunger). The distribution of features across the groups assigned to HFLS and HSLF from crossover GOWL for both outcomes are presented in Figure 4 in Appendix B. Dietary fiber is recommended to improve overall health in the general population (Marlett et al., 2002); however, the estimated ITRs from hunger and fullness may inform the development of tailored dietary intake advice for subgroups of at-risk adolescents.

Discussion

Precision medicine is an emerging field with rapid developments in analytical methods; however, these advancements typically revolve around parallel designs. This paper proposes the combined use of crossover designs and generalized outcome weighted learning for the purpose of estimating optimal ITRs. The proposed method addresses a key gap in the literature; little to no work has been done to better involve crossover designs in precision medicine, despite how naturally crossover studies lend themselves to the field. Kulasekera and Siriwardhana (2018) propose a ranking method to estimate the optimal ITR from a crossover study but provide no recommendations on how to deal with carryover effects. In contrast, crossover GOWL is able to handle such effects. Furthermore, regardless of the presence of carryover effects, the proposed method shows improvements in the estimated value and misclassification rate, especially at the smaller sample sizes typical of crossover designs compared to standard methods with the parallel group design.

An alternative to GOWL that has been developed is residual weighted learning (RWL) (Zhou et al., 2017). RWL is an extension of OWL that weights the misclassification error by residuals from a model fit to the outcome instead of the observed rewards themselves. Unlike GOWL, RWL uses a non-convex loss function that does not guarantee global minimization (Tao et al., 2005). In the proposed method, there is no need to include residuals in the weight, because the residuals would cancel when taking the difference between responses to each treatment. Thus, the proposed method avoids specifying a model for the main effect of the covariates.

We note that when the distribution of $\widetilde{A_{1}}=\mathrm{sign}\{R\}A_{1}$ is poorly allocated, the cross-validation mechanism for estimating $\lambda_{n}$ and $\sigma_{n}^{2}$ may fail. If there is prior knowledge on the distribution of $\mathrm{sign}\{R\}$ , investigators could adjust randomization probabilities when assigning patients to treatment sequences accordingly. Otherwise, it is possible for a training set to not observe at least one $\widetilde{A}_{1}=1$ or $\widetilde{A}_{1}=-1$ . Lastly, there may be low power in testing $H_{0}:E[\delta_{A_{1}}(\boldsymbol{X})]=0$ at smaller sample sizes.

Several extensions of estimating the optimal ITR from crossover data are yet to be explored. For example, only the $2\times 2$ design was studied in this paper. For larger design schemes, the proposed method could be implemented in a series of binary classifiers as in Dietterich and Bakiri (1994). Alternatively, one could expand crossover GOWL to multi-category classification. There have been several developments in multi-category SVM (Lee et al., 2004; Zhu et al., 2004). More recently, Liang et al. (2018) propose an outcome weighted deep learning method to estimate the optimal ITR for multiple treatments. Another possible extension is to consider the residual from modeling the treatment response difference as the observed reward. Fu et al. (2016) and Zhou et al. (2017) have seen favorable results using residual weights, but further improvements may come from using the residuals in outcome weighted learning with the piece-wise hinge loss from GOWL. Finally, the proposed method could be improved upon with methods for variable selection. For example, the $L_{1}$ penalty could be imposed during optimization to simultaneously restrict model complexity and perform variable selection as suggested by Chen et al. (2018), Song et al. (2015), Xu et al. (2015), and Zhou et al. (2017).

Acknowledgments

The authors were supported in part by NCI P01 CA142538 and NCMHD P60 MD002254-01.

Appendix

Appendix A

The reinforcement learning trees (RLT) (Zhu et al., 2015) performance from simulation Scenarios 3 and 4 as visualized in Figure 3, which displays the mean square prediction error of the estimated carryover compared with the true carryover from the testing set, or

[TABLE]

Despite the potential for a high mean square error, the crossover design still outperforms parallel design counterparts despite the presence of carryover effects, as can be seen in Figures 1 and 2.

Appendix B

92% of study participants, $(n=49)$ were assigned to the HFLS diet according to crossover GOWL to maximize change in fullness from baseline. To characterize the subgroup that, on average, experiences a larger increase in patient reported fullness, Figure 4 displays the distribution of continuous features across the estimated subgroups. Those assigned to the HFLS diet tend to be older with higher A1c. Because the HSLF group is small ( $n=4$ ), two-sample $t$ -tests would not be appropriate to test for significant differences between groups, and trends observed in Figure 4 should be confirmed in future studies. However, sex $(p=0.1131)$ , ethnicity $(p=1)$ , and Tanner stage $(p=0.4427)$ were tested using Fisher’s exact tests. All tests were non significant at the 0.05 level.

Figure 4 also displays the distribution of continuous features across the estimated subgroups to minimize the change in hunger from baseline. 51% $(n=27)$ of participants were assigned to HFLS. Those assigned to HFLS tend to be older, but differences in other covariates are not apparent. Although the sample in the HSLF is larger when we consider hunger as the outcome, both samples are still rather small. For this reason, two-sample $t$ -tests are still not appropriate. Fisher’s exact tests again did not yield any significant differences in sex $(p=0.5857)$ , ethnicity $(p=1)$ , or Tanner stage $(p=0.7040)$ .

In conclusion, using crossover GOWL appears to be effective for estimating the optimal ITR to maximize the change in satiety. Future research should confirm these subgroups in large sample sizes to better compare differences across features. If verified, future recommendations for adolescent minorities can be tailored by age and A1c levels to improve weight loss. Studies on overweight and obese minority adolescents are still needed to research alternative interventions for those that report feeling more satiated from the typical Western diet (HSLF).

Appendix C

The following assumptions are made for the theory behind the method proposed in the main paper.

Positivity: $P(A_{1}=a|\boldsymbol{X}=\boldsymbol{x})\geq\pi_{0}>0$ with probability 1 2. 2.

Conditional Exchangeability: $\{Y^{*}(-1),Y^{*}(1)\}\perp A_{1}|\boldsymbol{X}$ 3. 3.

Consistency: $Y_{k}=Y^{*}(A_{k})-\delta_{A_{1}}(\boldsymbol{X})1\{k=2\}$ 4. 4.

Outcomes follow the model

[TABLE]

for periods $k=1,2$ . $\boldsymbol{\epsilon}=(\epsilon_{1},\epsilon_{2})^{\top}$ has a positive definite covariance matrix, $\Sigma_{\epsilon}.$ 5. 5.

There exist $M_{0},M_{1}<\infty$ such that $|Y_{k}|<M_{0}$ almost surely, $|\delta_{A_{1}}(\boldsymbol{X})|<M_{1}$ almost surely, and $P\left(|\widehat{\delta}_{A_{1}}(\boldsymbol{X})|<M_{1}\right)\to 1$ as $n\to\infty$ 6. 6.

$\mathbb{P}\left[1\{\mathrm{sign}[\widehat{R}]\neq\mathrm{sign}[R]\}\right]=o_{P}(\lambda_{n})$

Proof of Lemma 1. The optimal ITR is $D_{0}=\operatorname*{\arg\!\max}_{D\in\mathcal{D}}E[Y^{*}\{D(\boldsymbol{X})\}]$ . Note that, under Assumption (4), $D_{0}(\boldsymbol{X})=\mathrm{sign}\{c(\boldsymbol{X})\}.$ The expected treatment response difference between treating according to $D_{0}$ and treating opposite to $D_{0}$ is

[TABLE]

for all $D\in\mathcal{D}$ . Thus, the optimal ITR also maximizes the treatment-response difference, or $D_{0}=\operatorname*{\arg\!\max}_{D\in\mathcal{D}}E[Y^{*}\{{D}(\boldsymbol{X})\}-Y^{*}\{-{D}(\boldsymbol{X})\}]$ . Therefore, it can be seen that

[TABLE]

where the second equality follows from Assumption (3). This proves the result.

Proof of Theorem 1. This proof follows from Lemma 1 and the results from Lin (2002). Recall that $\psi(u,v)=\max\{1-\mathrm{sign}(u)v,0\}$ . Minimizing the risk, $\mathcal{R}_{\psi}(f)$ is equivalent to minimizing the conditional risk,

[TABLE]

for every fixed $\boldsymbol{x}\in\mathcal{X}$ . Let $R^{+}=R1\{R\geq 0\}$ and $R^{-}=R1\{R<0\}.$ By the law of total expectation, the conditional risk becomes

[TABLE]

Next, note that $\mathcal{R}_{\psi}\{\mathrm{sign}(f),\boldsymbol{x}\}<\mathcal{R}_{\psi}(f,\boldsymbol{x})$ whenever $f(\boldsymbol{x})\not\in[-1,1].$ For example, when $f(\boldsymbol{x})<-1$ , the conditional risk reduces to

[TABLE]

which is monotonically increasing as $f(\boldsymbol{x})\to-\infty$ . A similar argument is made for when $f(\boldsymbol{x})>1$ . Thus, we restrict our search to $f(\boldsymbol{x})\in[-1,1]$ . Then,

[TABLE]

If $f^{*}(\boldsymbol{x})$ minimizes the conditional risk, then $f^{*}(x)$ must have the sign opposite of the expression $E\left[R\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=-1\right]-E\left[R\Big{|}\boldsymbol{X}=\boldsymbol{x},A_{1}=1\right]$ , and thus $D_{0}(\boldsymbol{X})=\mathrm{sign}\{f^{*}(\boldsymbol{X})\}$

Proof of Theorem 2. First, define the loss functions

[TABLE]

and

[TABLE]

Next, we show that $||\widehat{f}^{*}_{n}||$ is bounded. For any $f\in\mkern 1.5mu\overline{\mkern-1.5mu\mathcal{F}\mkern-1.5mu}\mkern 1.5mu,$

[TABLE]

by definition of $\widehat{f}^{*}_{n}.$ If we choose $f=0$ , then, for all $n$ large enough,

[TABLE]

where the last inequality holds because of Assumptions (1), (5), and (6). Define $M=\pi_{0}^{-1}(2M_{0}+M_{1})<\infty.$ Then, because $\mathbb{P}_{n}\widehat{L}_{\psi}(\widehat{f}^{*}_{n})\geq 0,$ we have that $\lambda_{n}||\widehat{f}^{*}_{n}||^{2}\leq M.$

For any bounded $f$ , such as $\widehat{f}^{*}_{n}$ , we may show that $\left|\mathbb{P}_{n}\left\{L_{\psi}(f)-\widehat{L}_{\psi}(f)\right\}\right|=o_{P}(1):$

[TABLE]

Next, we have

[TABLE]

Taking the $\limsup$ on both sides, we find

[TABLE]

Thus, it suffices to show that $\mathbb{P}_{n}L_{\psi}(\widehat{f}^{*}_{n})-\mathbb{P}_{n}(L_{\psi}(\widehat{f}^{*}_{n})\to_{P}0$ . Because $\lambda_{n}||\widehat{f}^{*}_{n}||^{2}$ is bounded by $M$ , $\{\sqrt{\lambda_{n}}f:||\sqrt{\lambda_{n}}f||\leq\sqrt{M}\}$ is contained in a Donsker class. Note that $\psi(u,v)$ is Lipschitz continuous with respect to $v$ , and, thus, $L_{\psi}(f)$ is Lipschitz continuous with respect to $f$ . Therefore, $\{\sqrt{\lambda_{n}}L_{\psi}(f):||\sqrt{\lambda_{n}}f||\leq\sqrt{M}\}$ is also Donsker. This gives us $\sqrt{n\lambda_{n}}\{\mathbb{P}_{n}-\mathbb{P}\}L_{\psi}(\widehat{f}^{*}_{n})=O_{p}(1)$ , which implies $\{\mathbb{P}_{n}-\mathbb{P}\}L_{\psi}(\widehat{f}^{*}_{n})=o_{P}(1).$ We finally arrive at $\left|\mathcal{R}_{\psi}(f^{*})-\mathcal{R}_{\psi}(\widehat{f}^{*}_{n})\right|=o_{P}(1)$ . Furthermore, when $f^{*}_{0}\in\mkern 1.5mu\overline{\mkern-1.5mu\mathcal{F}\mkern-1.5mu}\mkern 1.5mu,$ $f^{*}_{0}=f^{*}$ , and

[TABLE]

where the first inequality holds from Bartlett et al. (2006).

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson et al. (2009) Anderson, J. W., P. Baird, R. H. Davis, S. Ferreri, M. Knudtson, A. Koraym, V. Waters, and C. L. Williams (2009). Health benefits of dietary fiber. Nutrition Reviews 67 (4), 188–205.
2Anderson et al. (1994) Anderson, J. W., B. M. Smith, and N. J. Gustafson (1994). Health benefits and practical aspects of high-fiber diets. The American Journal of Clinical Nutrition 59 (5), 1242 S–1247 S.
3Ashley (2015) Ashley, E. A. (2015). The precision medicine initiative: a new national effort. Journal of the American Medical Association 313 (21), 2119–2120.
4Athey and Wager (2017) Athey, S. and S. Wager (2017). Efficient policy learning. https://arxiv.org/abs/1702.02896 .
5Bartlett et al. (2006) Bartlett, P. L., M. I. Jordan, and J. D. Mc Auliffe (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association 101 (473), 138–156.
6Bergman et al. (1979) Bergman, R. N., Y. Z. Ider, C. R. Bowden, and C. Cobelli (1979). Quantitative estimation of insulin sensitivity. American Journal of Physiology-Endocrinology And Metabolism 236 (6), E 667.
7Bleich et al. (2017) Bleich, S. N., K. A. Vercammen, L. Y. Zatz, J. M. Frelier, C. B. Ebbeling, and A. Peeters (2017). Interventions to prevent global childhood overweight and obesity: A systematic review. The Lancet Diabetes & Endocrinology 6 (4).
8Chakraborty and Murphy (2014) Chakraborty, B. and S. A. Murphy (2014). Dynamic treatment regimes. Annual Review of Statistics and its Application 1 , 447–464.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Abstract

Introduction

Methodology

Existing Methods

Crossover Generalized Outcome Weighted Learning

Lemma 2.1**.**

Theoretical Results

Theorem 3.1**.**

Theorem 3.2**.**

Simulation Studies

FAME Feeding Trial Data Analysis

Discussion

Acknowledgments

Appendix

Appendix A

Appendix B

Appendix C

Lemma 2.1.

Theorem 3.1.

Theorem 3.2.