Sharp bounds on the relative treatment effect for ordinal outcomes

Jiannan Lu; Yunshu Zhang; Peng Ding

arXiv:1907.10287·stat.ME·September 6, 2019

Sharp bounds on the relative treatment effect for ordinal outcomes

Jiannan Lu, Yunshu Zhang, Peng Ding

PDF

Open Access

TL;DR

This paper derives sharp bounds for the relative treatment effect in ordinal outcomes, allowing for arbitrary dependence between potential outcomes, which enhances interpretability when the average treatment effect is ill-defined.

Contribution

It provides the first derivation of sharp bounds on the relative treatment effect for ordinal outcomes without assuming independence of potential outcomes.

Findings

01

Derived sharp bounds on the relative treatment effect for ordinal outcomes.

02

Bounds are identifiable from observed data and accommodate arbitrary dependence.

03

Enhances interpretability of treatment effects in ordinal outcome studies.

Abstract

For ordinal outcomes, the average treatment effect is often ill-defined and hard to interpret. Echoing Agresti and Kateri (2017), we argue that the relative treatment effect can be a useful measure especially for ordinal outcomes, which is defined as $γ = pr {Y_{i} (1) > Y_{i} (0)} - pr {Y_{i} (1) < Y_{i} (0)}$ , with $Y_{i} (1)$ and $Y_{i} (0)$ being the potential outcomes of unit $i$ under treatment and control, respectively. Given the marginal distributions of the potential outcomes, we derive the sharp bounds on $γ,$ which are identifiable parameters based on the observed data. Agresti and Kateri (2017) focused on modeling strategies under the assumption of independent potential outcomes, but we allow for arbitrary dependence.

Figures1

Click any figure to enlarge with its caption.

Equations217

γ

γ

= \sum\sum_{k > l} p_{k l} - \sum\sum_{k < l} p_{k l} .

γ_{U}

γ_{U}

l^{'} = 0 \sum J - 1 p_{k l^{'}} = p_{k +} (k = 0, \dots, J - 1);

k^{'} = 0 \sum J - 1 p_{k^{'} l} = p_{+ l} (l = 0, \dots, J - 1);

p_{k l} \geq 0 (k, l = 0, \dots, J - 1) .

δ_{j m} = k = j \sum J - 1 p_{k +} + k = j + m \sum J - 1 p_{k +} + l = 0 \sum j - 2 p_{+ l} - l = j + m - 1 \sum J - 1 p_{+ l} .

δ_{j m} = k = j \sum J - 1 p_{k +} + k = j + m \sum J - 1 p_{k +} + l = 0 \sum j - 2 p_{+ l} - l = j + m - 1 \sum J - 1 p_{+ l} .

pr {Y_{i} (1) > Y_{i} (0)}

pr {Y_{i} (1) > Y_{i} (0)}

\leq pr {Y_{i} (1) \geq 1, Y_{i} (0) = 0} + pr {Y_{i} (1) \geq 2, Y_{i} (0) \geq 1}

\leq pr {Y_{i} (1) \geq 1, Y_{i} (0) = 0} + pr {Y_{i} (1) \geq 2}

= pr {Y_{i} (1) \geq 1} - pr {Y_{i} (1) \geq 1, Y_{i} (0) \geq 1} + pr {Y_{i} (1) \geq 2}

pr {Y_{i} (1) < Y_{i} (0)}

pr {Y_{i} (1) < Y_{i} (0)}

\geq pr {Y_{i} (0) \geq 1, Y_{i} (1) = 0}

= pr {Y_{i} (0) \geq 1} - pr {Y_{i} (1) \geq 1, Y_{i} (0) \geq 1},

γ

γ

= k = 1 \sum J - 1 p_{k +} + k = 2 \sum J - 1 p_{k +} - l = 1 \sum J - 1 p_{+ l}

= δ_{11} .

γ_{U} = 1 \leq j \leq J - 1 min 1 \leq m \leq J - j min δ_{j m} .

γ_{U} = 1 \leq j \leq J - 1 min 1 \leq m \leq J - j min δ_{j m} .

γ_{L} = 1 \leq j \leq J - 1 max 1 \leq m \leq J - j max ξ_{j m},

γ_{L} = 1 \leq j \leq J - 1 max 1 \leq m \leq J - j max ξ_{j m},

ξ_{j m} = k = j + m - 1 \sum J - 1 p_{k +} - l = j \sum J - 1 p_{+ l} - l = j + m \sum J - 1 p_{+ l} - k = 0 \sum j - 2 p_{k +},

ξ_{j m} = k = j + m - 1 \sum J - 1 p_{k +} - l = j \sum J - 1 p_{+ l} - l = j + m \sum J - 1 p_{+ l} - k = 0 \sum j - 2 p_{k +},

p_{k +} = \frac{\sum _{i = 1}^{N} Z _{i} 1 _{{Y_{i} = k}}}{\sum _{i = 1}^{N} Z _{i}}, p_{+ l} = \frac{\sum _{i = 1}^{N} ( 1 - Z _{i} ) 1 _{{Y_{i} = l}}}{\sum _{i = 1}^{N} ( 1 - Z _{i} )} .

p_{k +} = \frac{\sum _{i = 1}^{N} Z _{i} 1 _{{Y_{i} = k}}}{\sum _{i = 1}^{N} Z _{i}}, p_{+ l} = \frac{\sum _{i = 1}^{N} ( 1 - Z _{i} ) 1 _{{Y_{i} = l}}}{\sum _{i = 1}^{N} ( 1 - Z _{i} )} .

\widehat{p}_{k+}=\sum_{i=1}^{N}\frac{Z_{i}\bm{1}_{\{Y_{i}=k\}}}{\widehat{e}(\bm{X}_{i})}\big{/}\sum_{i=1}^{N}\frac{Z_{i}}{\widehat{e}(\bm{X}_{i})},\quad\widehat{p}_{+l}=\sum_{i=1}^{N}\frac{(1-Z_{i})\bm{1}_{\{Y_{i}=l\}}}{1-\widehat{e}(\bm{X}_{i})}\big{/}\sum_{i=1}^{N}\frac{1-Z_{i}}{1-\widehat{e}(\bm{X}_{i})},

\widehat{p}_{k+}=\sum_{i=1}^{N}\frac{Z_{i}\bm{1}_{\{Y_{i}=k\}}}{\widehat{e}(\bm{X}_{i})}\big{/}\sum_{i=1}^{N}\frac{Z_{i}}{\widehat{e}(\bm{X}_{i})},\quad\widehat{p}_{+l}=\sum_{i=1}^{N}\frac{(1-Z_{i})\bm{1}_{\{Y_{i}=l\}}}{1-\widehat{e}(\bm{X}_{i})}\big{/}\sum_{i=1}^{N}\frac{1-Z_{i}}{1-\widehat{e}(\bm{X}_{i})},

Δ_{j} = pr {Y_{i} (1) \geq j} - pr {Y_{i} (0) \geq j} = k \geq j \sum p_{k +} - l \geq j \sum p_{+ l}, (j = 1, \dots, J - 1),

Δ_{j} = pr {Y_{i} (1) \geq j} - pr {Y_{i} (0) \geq j} = k \geq j \sum p_{k +} - l \geq j \sum p_{+ l}, (j = 1, \dots, J - 1),

δ_{j m} = Δ_{j} + l = 0 \sum j - 2 p_{+ l} + k = j + m \sum J - 1 p_{k +} + l = j \sum j + m - 2 p_{+ l}

δ_{j m} = Δ_{j} + l = 0 \sum j - 2 p_{+ l} + k = j + m \sum J - 1 p_{k +} + l = j \sum j + m - 2 p_{+ l}

ξ_{j m} = Δ_{j} - k = 0 \sum j - 2 p_{k +} - l = j + m \sum J - 1 p_{+ l} - k = j \sum j + m - 2 p_{k +},

ξ_{j m} = Δ_{j} - k = 0 \sum j - 2 p_{k +} - l = j + m \sum J - 1 p_{+ l} - k = j \sum j + m - 2 p_{k +},

l^{'} = 0 \sum n - 1 a_{k l^{'}} \leq x_{k}, k^{'} = 0 \sum n - 1 a_{k^{'} l} = y_{l} (k, l = 0, \dots, n - 1) .

l^{'} = 0 \sum n - 1 a_{k l^{'}} \leq x_{k}, k^{'} = 0 \sum n - 1 a_{k^{'} l} = y_{l} (k, l = 0, \dots, n - 1) .

l^{'} = 0 \sum n - 1 b_{k l^{'}} = x_{k}, k^{'} = 0 \sum n - 1 b_{k^{'} l} \leq y_{l} (k, l = 0, \dots, n - 1) .

l^{'} = 0 \sum n - 1 b_{k l^{'}} = x_{k}, k^{'} = 0 \sum n - 1 b_{k^{'} l} \leq y_{l} (k, l = 0, \dots, n - 1) .

l = j \sum j + m - 1 p_{+ l} = l = j \sum j + m - 2 p_{+ l} + p_{+, j + m - 1}, k = j + m + 1 \sum J - 1 p_{k +} = k = j + m \sum J - 1 p_{k +} - p_{j + m, +} .

l = j \sum j + m - 1 p_{+ l} = l = j \sum j + m - 2 p_{+ l} + p_{+, j + m - 1}, k = j + m + 1 \sum J - 1 p_{k +} = k = j + m \sum J - 1 p_{k +} - p_{j + m, +} .

δ_{j, m + 1}

δ_{j, m + 1}

= Δ_{j} + l = 0 \sum j - 2 p_{+ l} + k = j + m \sum J - 1 p_{k +} - p_{j + m, +} + l = j \sum j + m - 2 p_{+ l} + p_{+, j + m - 1}

= δ_{j m} + p_{+, j + m - 1} - p_{j + m, +} .

l = 0 \sum j - 2 p_{+ l} = l = 0 \sum j - 1 p_{+ l} - p_{+, j - 1}, l = j \sum j + m - 1 p_{+ l} = l = j + 1 \sum j + m - 1 p_{+ l} + p_{+ j},

l = 0 \sum j - 2 p_{+ l} = l = 0 \sum j - 1 p_{+ l} - p_{+, j - 1}, l = j \sum j + m - 1 p_{+ l} = l = j + 1 \sum j + m - 1 p_{+ l} + p_{+ j},

δ_{j, m + 1}

δ_{j, m + 1}

= Δ_{j} + l = 0 \sum j - 1 p_{+ l} - p_{+, j - 1} + k = j + m + 1 \sum J - 1 p_{k +} + l = j + 1 \sum j + m - 1 p_{+ l} + p_{+ j}

= Δ_{j + 1} + p_{j +} + l = 0 \sum j - 1 p_{+ l} - p_{+, j - 1} + k = j + m + 1 \sum J - 1 p_{k +} + l = j + 1 \sum j + m - 1 p_{+ l}

= δ_{j + 1, m} + p_{j +} - p_{+, j - 1} .

δ_{j + 1, J - 1 - j} = δ_{j, J - j} + (p_{+, j - 1} - p_{j +}) = \dots = δ_{1, J - 1} + l = 0 \sum j - 1 p_{+ l} - k = 1 \sum j p_{k +} .

δ_{j + 1, J - 1 - j} = δ_{j, J - j} + (p_{+, j - 1} - p_{j +}) = \dots = δ_{1, J - 1} + l = 0 \sum j - 1 p_{+ l} - k = 1 \sum j p_{k +} .

δ_{1, J - 1} = δ_{1, J - 2} + (p_{+, J - 2} - p_{J - 1, +}) = \dots = δ_{1 j} + l = j \sum J - 2 p_{+ l} - k = j + 1 \sum J - 1 p_{k +} .

δ_{1, J - 1} = δ_{1, J - 2} + (p_{+, J - 2} - p_{J - 1, +}) = \dots = δ_{1 j} + l = j \sum J - 2 p_{+ l} - k = j + 1 \sum J - 1 p_{k +} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life · Advanced Causal Inference Techniques · Statistical Methods and Inference

Full text

Sharp bounds on the relative treatment effect for ordinal outcomes

Jiannan Lu

Yunshu Zhang and Peng Ding Jiannan Lu is Senior Data Scientist (E-mail: [email protected]), Analysis and Experimentation, Microsoft Corporation, Redmond, WA 98052, U.S.A. Yunshu Zhang is Doctoral Student (E-mail: [email protected]), Department of Statistics, North Carolina State University, Raleigh, NC 27695, U.S.A. Peng Ding is Assistant Professor (E-mail: [email protected]), Department of Statistics, University California, Berkeley, CA 94270, U.S.A.

Abstract

For ordinal outcomes, the average treatment effect is often ill-defined and hard to interpret. Echoing Agresti and Kateri, (2017), we argue that the relative treatment effect can be a useful measure especially for ordinal outcomes, which is defined as $\gamma=\mathrm{pr}\{Y_{i}(1)>Y_{i}(0)\}-\mathrm{pr}\{Y_{i}(1)<Y_{i}(0)\}$ , with $Y_{i}(1)$ and $Y_{i}(0)$ being the potential outcomes of unit $i$ under treatment and control, respectively. Given the marginal distributions of the potential outcomes, we derive the sharp bounds on $\gamma,$ which are identifiable parameters based on the observed data. Agresti and Kateri, (2017) focused on modeling strategies under the assumption of independent potential outcomes, but we allow for arbitrary dependence.

Keywords: Causal inference; partial identification; potential outcomes

Causal inference with ordinal outcomes

Ordinal outcomes are very common in empirical research (e.g., Whitehead et al., 2001; Scharfstein et al., 2004; Huang et al., 2017; Liu and Zhang, 2018). Consider a binary treatment and an ordinal outcome with labels $0,\ldots,J-1$ , where 0 and $J-1$ denote the worst and best categories, respectively. Define $\left\{Y_{i}(1),Y_{i}(0)\right\}$ as the potential outcomes of unit $i\in\{1,\ldots,N\}$ under treatment and control, respectively. For all $k,l=0,\ldots,J-1,$ let $p_{kl}=\mathrm{pr}\left\{Y_{i}(1)=k,Y_{i}(0)=l\right\}$ denote the probability that the potential outcome is $k$ under treatment and $l$ under control, respectively. The probability matrix $\bm{P}=(p_{kl})_{0\leq k,l\leq J-1}$ characterizes the joint distribution of the potential outcomes. Let $p_{k+}=\sum_{l^{\prime}=0}^{J-1}p_{kl^{\prime}}$ and $p_{+l}=\sum_{k^{\prime}=0}^{J-1}p_{k^{\prime}l}$ be the marginal distributions of the potential outcomes under treatment and control, respectively. We let $\bm{p}_{1}=(p_{0+},\ldots,p_{J-1,+})^{\textrm{T}}$ and $\bm{p}_{0}=(p_{+0},\ldots,p_{+,J-1})^{\textrm{T}}$ denote the marginal probability vectors.

For ordinal outcomes, the average treatment effect $E\{Y_{i}(1)-Y_{i}(0)\}$ is often hard to interpret, if there is no clear definition of “distance” between different categories. In contrast, the parameters $\tau=\mathrm{pr}\{Y_{i}(1)\geq Y_{i}(0)\}$ and $\eta=\mathrm{pr}\{Y_{i}(1)>Y_{i}(0)\}$ have clear interpretations as the probabilities that the treatment is beneficial and strictly beneficial for the outcome (Newcombe, 2006b ; Newcombe, 2006a ; Zhou, 2008; Huang et al., 2017; Lu et al., 2018). Recently, Agresti and Kateri, (2017) used the relative treatment effect for ordinal outcomes (Agresti, 2010), defined as

[TABLE]

We can verify that $\gamma=\mathrm{pr}\{Y_{i}(1)>Y_{i}(0)\}-[1-\mathrm{pr}\{Y_{i}(1)\geq Y_{i}(0)\}]=\tau+\eta-1.$ The parameters $\tau$ , $\eta$ and $\gamma$ are closely related to the classic Wilcoxon–Mann–Whitney statistic for testing equality of two distributions (Kruskal, 1952, 1957; Klotz, 1966; Vargha and Delaney, 1998; Chung and Romano, 2016; Divine et al., 2018). The parameters $\tau$ , $\eta$ and $\gamma$ depend on the joint distribution of the potential outcomes and are not identifiable based on the observed data (Hand, 1992; Demidenko, 2016; Huang et al., 2017; Lu et al., 2018; Greenland et al., 2019). Huang et al., (2017) obtained numerical bounds on $\tau$ and $\eta$ , and Lu et al., (2018) derived explicit formulas of these bounds. Agresti and Kateri, (2017) and Cheng, (2009) discussed $\gamma$ assuming independent potential outcomes implicitly and explicitly. Chiba, (2018) proposed a Bayesian approach to infer $\gamma$ , which requires imposing a prior on the joint distribution of the potential outcomes. Fay et al., (2018) and Fay and Malinovsky, (2018) pointed out the non-identifiability nature of $\gamma$ and proposed a non-sharp bound on $\gamma$ given the marginal distributions of $\mathrm{pr}\{Y_{i}(1)\}$ and $\mathrm{pr}\{Y_{i}(0)\},$ based on Lu et al., (2018)’s bounds on $\tau$ and $\eta.$

For $J=2$ (i.e., when $Y$ is binary), the relative treatment effect reduces to $\gamma=p_{1+}-p_{+1}=E\{Y_{i}(1)-Y_{i}(0)\}=E\{Y_{i}(1)\}-E\{Y_{i}(0)\},$ which is actually the average treatment effect. Because the average treatment effect depends only on the marginal distributions of the potential outcomes, $\gamma$ is identifiable from the observed data with $J=2$ . However, $\gamma$ becomes unidentifiable when $J\geq 3$ , because it depends on the joint distribution of the treated and control potential outcomes. We adopt the partial identification strategy (c.f. Manski, 2003; Richardson et al., 2014) and focus on the sharp bounds on $\gamma$ . We compute the maximum and minimum values of $\gamma$ that are compatible with the marginal distributions of the potential outcomes. As a theoretical starting point, we assume that the marginal probabilities $\bm{p}_{1}$ and $\bm{p}_{0}$ are known, and later we will incorporate sampling variability. The sharp upper bound $\gamma_{U}$ is the solution of the following linear programming problem:

[TABLE]

The sharp lower bound $\gamma_{L}$ is the corresponding minimum value subject to the same set of constraints. By definitions, the sharp upper and lower bounds are functions of the marginal probabilities $\bm{p}_{1}$ and $\bm{p}_{0}$ , although the relative treatment effect $\gamma$ itself is a function of the joint probability matrix $\bm{P}.$ Balke and Pearl, (1997) and Huang et al., (2017) used linear programming to obtain bounds on different causal parameters for ordinal and more general outcomes. Numerically, we can easily obtain the values of $\gamma_{U}$ and $\gamma_{L}$ for given values of $\bm{p}_{1}$ and $\bm{p}_{0}$ . However, our goal here is to derive explicit formulas, as in Balke and Pearl, (1997) and Lu et al., (2018), which give more transparent interpretations and allow for convenient estimation and inference.

The rest of the paper is organized as follows. Section 2 derives the sharp bounds on the relative treatment effect. Section 3 discusses the statistical inference based on the derived bounds under different scenarios such as completely randomized experiments and observational studies. Section 4 presents two examples to illustrate our proposed method. We relegate all technical details to the supplementary material.

Main results: Sharp bounds on $\gamma$

2.1 Notation

We introduce a few quantities that are needed to express the sharp bounds on $\gamma$ . For each fixed $j=1,\ldots,J-1$ and $m=1,\ldots,J-j,$ let

[TABLE]

Define the summation to be zero when the range is empty, e.g., $\sum_{l=0}^{j-2}p_{+l}=0$ if $j=1.$ Importantly, the $\delta_{jm}$ ’s depend only on the marginal probabilities $\bm{p}_{1}$ and $\bm{p}_{0}.$ Before moving forward, we provide insights on the important roles the $\delta_{jm}$ ’s play in deriving the sharp bounds on the relative treatment effect $\gamma.$ For example, by taking the difference between

[TABLE]

and

[TABLE]

we obtain

[TABLE]

In other words, $\delta_{11}$ is a loose upper bound on $\gamma.$ Similarly, we can prove that other $\delta_{jm}$ ’s are also loose upper bounds on $\gamma.$ Interestingly, in the next subsection we will show that the $\delta_{jm}$ ’s together can sharply bound $\gamma$ .

2.2 Main theorem, corollaries and remarks

We now present the main result of this paper.

Theorem 1.

When $J\geq 3$ , the sharp upper bound on the relative treatment effect $\gamma$ is

[TABLE]

In the supplementary material we provide a proof of Theorem 1, which consists of two parts. First, as previously mentioned, we show that $\gamma_{U}\leq\delta_{jm}$ for $j=1,\ldots,J-1$ and $m=1,\ldots,J-j.$ Second, we prove the sharpness of $\gamma_{U}$ by directly constructing a probability matrix $\bm{P}$ attaining the bound given the marginal distributions. Although not affecting the proof, it is worth noting that the probability matrix attaining $\gamma_{U}$ might not be unique in general.

By switching the labels of the treatment and control potential outcomes, it is straightforward to obtain the sharp lower bound on the relative treatment effect $\gamma.$

Corollary 1.

When $J\geq 3$ , the sharp lower bound on the relative treatment effect $\gamma$ is

[TABLE]

where for all $j=1,\ldots,J-1$ and $m=1,\ldots,J-j,$

[TABLE]

with summations being zero if the range is empty.

Remark 1.

For $J=3$ , we can verify that $\delta_{11}=p_{1+}+2p_{2+}-p_{+1}-p_{+2},$ $\delta_{12}=p_{2+}-p_{+2}+p_{+1},$ and $\delta_{21}=p_{2+}-p_{+2}+p_{+0},$ and that $\xi_{11}=p_{1+}+p_{2+}-p_{+1}-2p_{+2},$ $\xi_{12}=p_{2+}-p_{+1}-p_{+2},$ and $\xi_{21}=p_{2+}-p_{+2}-p_{0+}.$ Consequently, the sharp lower bound in Theorem 1 reduces to $\gamma_{L}=\max\left(p_{1+}+p_{2+}-p_{+1}-2p_{+2},p_{2+}-p_{+1}-p_{+2},p_{2+}-p_{+2}-p_{0+}\right),$ and the sharp upper bound in Corollary 1 reduces to $\gamma_{U}=\min\left(p_{1+}+2p_{2+}-p_{+1}-p_{+2},p_{2+}-p_{+2}+p_{+1},p_{2+}-p_{+2}+p_{+0}\right).$

Intuitively, $\gamma_{U}$ and $\gamma_{L}$ correspond to “extremely” positive and negative associations between potential outcomes $Y_{i}(1)$ and $Y_{i}(0).$ In practice, because they are characteristics of the same unit, it is plausible to rule out the scenarios with negatively associated potential outcomes (Ding and Dasgupta, 2016; Lu et al., 2018). Therefore, we can use the previous result with independent potential outcomes as a lower bound (Cheng, 2009; Agresti, 2010; Agresti and Kateri, 2017).

Corollary 2.

With independent potential outcomes, i.e., $p_{kl}=p_{k+}p_{+l}$ , the relative treatment effect can be identified as $\gamma_{I}=\mathop{\sum\sum}_{k>l}p_{k+}p_{+l}-\mathop{\sum\sum}_{k<l}p_{k+}p_{+l}.$

We suggest using $[\gamma_{I},\gamma_{U}]$ as the bounds on $\gamma$ as in the examples in Section 4.

Statistical modeling and inference

3.1 Point estimation

To estimate the sharp bounds of the relative treatment effect $\gamma,$ we first estimate the marginal probabilities of the potential outcomes. Let $Z_{i}$ be the binary treatment indicator, with $Z_{i}=1$ if unit $i$ receives treatment and $Z_{i}=0$ if unit $i$ receives control. The observed outcome is therefore $Y_{i}=Z_{i}Y_{i}(1)+(1-Z_{i})Y_{i}(0)$ . In some studies, we also have pretreatment covariates $\bm{X}_{i}$ . We assume that the observations $\{Z_{i},\bm{X}_{i},Y_{i}(1),Y_{i}(0)\}_{i=1}^{N}$ are independent and identically draws from a super population. Following Lu et al., (2018), we consider the following two scenarios:

Completely randomized experiment with $Z_{i}\perp\!\!\!\perp\{Y_{i}(1),Y_{i}(0)\}$ . Therefore, we can estimate the marginal probabilities by their sample analogues

[TABLE] 2. 2.

Unconfounded observational study with $Z_{i}\perp\!\!\!\perp\{Y_{i}(1),Y_{i}(0)\}\mid\bm{X}_{i}$ . For illustration, we focus on the propensity score weighting and outcome modeling approaches. First, we can estimate the marginal probabilities by the inverse propensity score weighting:

[TABLE]

where $\widehat{e}(\bm{X}_{i})$ is the fitted value of the propensity score $e(\bm{X}_{i})=\mathrm{pr}(Z_{i}=1\mid\bm{X}_{i})$ , for example, via a logistic regression of the treatment indicator on the covariates. Second, we can fit two outcome models $\mathrm{pr}(Y_{i}\mid Z_{i}=1,\bm{X}_{i})$ and $\mathrm{pr}(Y_{i}\mid Z_{i}=0,\bm{X}_{i})$ using the data under treatment and control, respectively. A canonical choice for ordinal outcomes is the proportional odds model (c.f. Agresti, 2010). We then obtain the fitted values $\widehat{p}_{k+}(\bm{X}_{i})=\widehat{\mathrm{pr}}(Y_{i}=k\mid Z_{i}=1,\bm{X}_{i})$ and $\widehat{p}_{+l}(\bm{X}_{i})=\widehat{\mathrm{pr}}(Y_{i}=l\mid Z_{i}=0,\bm{X}_{i})$ for all units. The final outcome-regression estimators for the marginal probabilities are $\widehat{p}_{k+}=\sum_{i=1}^{N}\widehat{p}_{k+}(\bm{X}_{i})/N$ and $\widehat{p}_{+l}=\sum_{i=1}^{N}\widehat{p}_{+l}(\bm{X}_{i})/N.$ We can estimate the bounds $[\gamma_{I},\gamma_{U}]$ using a plug-in approach after obtaining the $\widehat{p}_{k+}$ ’s and $\widehat{p}_{+l}$ ’s.

3.2 Sharpening bounds using covariates

Agresti and Kateri, (2017)’s strategy of covariate adjustment is slightly different from the above discussion in Section 3.1. Agresti and Kateri, (2017) first estimated the conditional relative treatment effect given covariates, and then averaged over the empirical distribution of covariates. This is similar to the strategy of using covariates to sharpen the bounds (Grilli and Mealli, 2008; Lee, 2009; Long and Hudgens, 2013; Lu et al., 2018). In particular, we can first estimate the conditional bounds given covariates $\widehat{\gamma}_{I}(\bm{X}_{i})=\mathop{\sum\sum}_{k>l}\widehat{p}_{k+}(\bm{X}_{i})\widehat{p}_{+l}(\bm{X}_{i})-\mathop{\sum\sum}_{k<l}\widehat{p}_{k+}(\bm{X}_{i})\widehat{p}_{+l}(\bm{X}_{i})$ and $\widehat{\gamma}_{U}(\bm{X}_{i})=\min_{1\leq j\leq J-1}~{}\min_{1\leq m\leq J-j}\widehat{\delta}_{jm}(\bm{X}_{i}),$ and then estimate the bounds by $\widehat{\gamma}_{I}=\sum_{i=1}^{N}\widehat{\gamma}_{I}(\bm{X}_{i})/N$ and $\widehat{\gamma}_{U}=\sum_{i=1}^{N}\widehat{\gamma}_{U}(\bm{X}_{i})/N.$

3.3 Confidence intervals

Following the existing literature on statistical inferences for partially identified parameters (Cheng and Small, 2006; Yang and Small, 2016), we construct a $(1-\alpha)$ -level confidence interval for the sharp bounds $(\gamma_{I},\gamma_{U}),$ which automatically covers $\gamma$ at least $100(1-\alpha)\%$ of the time. However, as pointed out by Hirano and Porter, (2012), delicate issues arise in this case, especially the trade-off between simplicity of implementation and uniformity of the coverage properties of confidence intervals. For the empirical examples in Section 4, we employ Horowitz and Manski, (2000)’s non-parametric bootstrap interval $\{\widehat{\gamma}_{I}-z^{*}_{\alpha},\widehat{\gamma}_{U}+z^{*}_{\alpha}\},$ where we obtain the threshold $z^{*}_{\alpha}$ by solving the equation $\mathrm{pr}_{B}\{\widehat{\gamma}_{I}^{*}-z^{*}_{\alpha}\leq\widehat{\gamma}_{I},\widehat{\gamma}_{U}\leq\widehat{\gamma}_{U}^{*}+z^{*}_{\alpha}\}=1-\alpha,$ where $\widehat{\gamma}_{I}^{*}$ and $\widehat{\gamma}_{U}^{*}$ are drawn from the Bootstrap distribution $\mathrm{pr}_{B}.$ While more sophisticated methods (e.g., Romano and Shaikh, 2010; Chernozhukov et al., 2013; Jiang and Ding, 2018) may be more rigorous theoretically, previous discussions (e.g., Lu et al., 2018) showed that the interval by Horowitz and Manski, (2000) achieved similar finite-sample performances, at least in the context of ordinal outcomes.

Applications

4.1 A randomized experiment

We illustrate our theory and method using the Sexual Assault Resistance Education Trial (Senn et al., 2015), previously analyzed by Lu et al., (2018). In this randomized experiment, the treatment is the enhanced Assess, Acknowledge and Act program, which aims at preventing sexual assaults. The outcome of interest has six categories from “complete rape” to “no reporting of any non-consensual sexual contact,” labelled as 0–5. The numbers of units are $(23,15,48,67,121,177,451)$ in the treatment arm and $(42,40,62,103,184,11,442)$ in the control arm, corresponding to the outcome categories $(0,1,2,3,4,5)$ . Based on these data, we estimate the sharp bounds on $\gamma$ as $[\widehat{\gamma}_{I},\widehat{\gamma}_{U}]=[0.387,0.900],$ and the corresponding 95% bootstrap confidence interval is $[0.315,0.972].$ The results imply that the program is beneficial, which corroborate the recommendations by Senn et al., (2015) and Lu et al., (2018).

4.2 An observational study

We illustrate our theory and method using an observational study from the Karolinska Institute in Stockholm, Sweden, which was previously analyzed by Rubin, (2008). The data have 158 cardia cancer patients diagnosed between 1988 and 1995. The treatment is whether the patient is diagnosed in a high volume hospital, defined as treating more than 10 patients with cardia cancer during that period. The outcome is the survival time of the patient after the diagnosis, with three categories ordered as “one year,” “between two and four years” and “longer than five years”. For patients diagnosed in a high volume hospital, 51 survived for one year, 18 survived between two and four years, and 10 survived longer than five year. For patients diagnosed in a low volume hospital, the numbers are 50, 21 and 8. Pre-treatment covariates include the age at diagnosis, indicator of male, and indicator of whether the patient is from the rural areas. The last covariate is an important confounder in this example, because patients from rural areas would be more likely to attend low volume hospitals ( $p$ -value 0.0001).

We assume that the treatment is unconfounded given the observed pre-treatment covariates. We first fit two separate proportional odds models for the outcomes under treatment and control, respectively. We then obtain the fitted probabilities for each individual under both treatment and control. We finally use the strategy in Section 3.2 to obtain sharp bounds on the relative treatment effect as $[\widehat{\gamma}_{I},\widehat{\gamma}_{U}]=[0.055,0.183]$ with the 95% bootstrap confidence interval $[-0.137,0.375].$ The lower confidence limit, corresponding to independent potential outcomes as in Agresti and Kateri, (2017), is smaller than 0 although the point estimate of the lower bound is positive.

Acknowledgements

The authors thank the co-Editor and a reviewer for their constructive comments. This work is motivated by an open question in Lu’s doctoral thesis, and he gratefully acknowledges his advisors, Professors Tirthankar Dasgupta, Joseph Blitzstein and Luke Miratrix. Zhang thanks Professor Ke Deng for valuable suggestions. Ding is partially supported by Institute of Education Sciences Grant R305D150040 and National Science Foundation Grant DMS-1713152.

Supporting Information

Web Appendices referenced in Section 1, R code, and data are available with this paper at the Biometrics website on Wiley Online Library.

**Supporting Information for “Sharp bounds on the relative treatment effect for ordinal outcomes”

**by Jiannan Lu, Yunshu Zhang and Peng Ding

Overview and notation

The supplementary materials are organized in the following way. Section S2 gives several lemmas that are useful for proving the main results. Section S3 gives a proof of Theorem 1, and Section S4 gives a proof of Corollary 1.

To simplify the proofs, we need the distributional causal effects

[TABLE]

which compare the marginal distribution functions of the potential outcomes. By (2), (5) and (S1), for all $j=1,\ldots,J-1$ and $m=1,\ldots,J-j,$

[TABLE]

and

[TABLE]

Again, we follow the convention in the main text to define the summation as zero when the range is empty, e.g., $\sum_{l=j}^{j+m-2}p_{+l}=0$ if $m=1.$

Lemmas and their proofs

In this section, we introduce three lemmas, which play instrumental roles in deriving the sharp bounds on the relative treatment effect $\gamma.$

S2.1 Lemma 1 from Lu et al., (2018)

Lemma 1.

Assume that $\left(x_{0},\ldots,x_{n-1}\right)$ and $\left(y_{0},\ldots,y_{n-1}\right)$ are non-negative constants.

(a)

If $\sum_{r=s}^{n-1}x_{r}\geq\sum_{r=s}^{n-1}y_{r}$ for all $s=0,\ldots,n-1,$ there exists an $n\times n$ lower triangular matrix $\bm{A}_{n}=(a_{kl})_{0\leq k,l\leq n-1}$ with non-negative elements such that

[TABLE] 2. (b)

If $\sum_{r=0}^{s}x_{r}\leq\sum_{r=0}^{s}y_{r}$ for all $s=0,\ldots,n-1,$ there exists an $n\times n$ lower triangular matrix $\bm{B}_{n}=(b_{kl})_{0\leq k,l\leq n-1}$ with non-negative elements such that

[TABLE]

S2.2 Lemma 2 and its proof

The second lemma establishes various relationships among the $\delta_{jm}$ ’s defined in (S2).

Lemma 2.

For fixed $j=1,\ldots,J-2,$

(a)

$\delta_{jm}+p_{+,j+m-1}-p_{j+m,+}=\delta_{j,m+1}$ for $m=1,\ldots,J-1-j;$ 2. (b)

$\delta_{j+1,m}+p_{j+}-p_{+,j-1}=\delta_{j,m+1}$ for $m=1,\ldots,J-1-j;$ 3. (c)

$\delta_{j+1,J-1-j}+p_{+,J-1}-p_{0+}=\delta_{1j}.$

Proof of Lemma 2(a).

Notice that

[TABLE]

Therefore,

[TABLE]

The proof is complete. ∎

Proof of Lemma 2(b).

Notice that

[TABLE]

and that ${\Delta_{j}}={\Delta_{j+1}}+{p_{j+}}-{p_{+j}}$ by (S1). Therefore,

[TABLE]

The proof is complete. ∎

Proof of Lemma 2(c).

By repeatedly utilizing Lemma 2(b), we have

[TABLE]

Moreover, by repeatedly utilizing Lemma 2(a), we have

[TABLE]

Combining the above two equations, we have

[TABLE]

which completes the proof. ∎

S2.3 Lemma 3 and its proof

Lemma 3 bridges the first two lemmas by repeatedly utilizing Lemma 2 to find subsets of the marginal probabilities which meet the conditions of Lemma 1. When proving the main theorem, we utilize Lemma 3 to construct a probability matrix attaining the upper bound $\gamma_{U}.$

Lemma 3.

Let $\Omega=\left\{(1,1),\ldots,(1,J-1);(2,1),\ldots,(2,J-2);\ldots;(J-1,1)\right\}$ denote the lexicographically ordered set of the 2-tuples $(j,m)$ ’s, where for each $j=1,\ldots,J-1,$ the corresponding $m$ takes values between $1$ and $J-j.$ Let

[TABLE]

be the first 2-tuple attaining the minimum value of $\delta_{jm}$ , and

[TABLE]

The following results hold.

(a)

If $j_{1}>1,$ let $\Omega_{1}=\{1,\ldots,j_{1}-1\}$ and

[TABLE]

Then

[TABLE] 2. (b)

If $m_{1}>1,$ let $\Omega_{2}=\{j_{1}+1,\ldots,j_{1}+m_{1}-1\}$ and

[TABLE]

Then

[TABLE] 3. (c)

If $j_{1}+m_{1}<J,$ let $\Omega_{3}=\{j_{1}+m_{1}-1,\ldots,J-2\}$ and

[TABLE]

Then

[TABLE]

Proof of Lemma 3(a).

The starting point of the proof is that $\delta_{j_{1},m_{1}}$ is the smallest among all the $\delta_{jm}$ ’s. Then, the key idea is to use Lemma 2 to transform $\{\delta_{j_{1},m_{1}}\leq\delta_{j,m}:j=1,\ldots,J-1;m=1,\ldots,J-j\}$ into inequalities regarding certain subsets of the marginal probabilities. To be specific, if $j_{1}>1,$ we repeatedly utilize Lemma 2(b) and obtain

[TABLE]

By (S4), $\delta_{j_{1},m_{1}}\leq\delta_{n,j_{1}+m_{1}-n}$ for $n=1,\ldots,j_{1}-1,$ implying

[TABLE]

Moreover, by repeatedly utilizing Lemma 2(a), we have

[TABLE]

By combining (S12) and (S14), we have

[TABLE]

Similarly, because $\delta_{j_{1},m_{1}}\leq\delta_{n,j_{1}-n}$ for all $n=1,\ldots,j_{1}-1,$

[TABLE]

The proof is thus complete because (S7) holds by (S13) and (S15). ∎

Proof of Lemma 3(b).

If $m_{1}>1,$ we first repeatedly utilize Lemma 2(a) and obtain

[TABLE]

Because $\delta_{j_{1},m_{1}}\leq\delta_{j_{1},n}$ for $n=1,\ldots,m_{1}-1,$

[TABLE]

Moreover, by repeatedly utilizing Lemma 2(b), we have

[TABLE]

Because $\delta_{j_{1},m_{1}}\leq\delta_{n,j_{1}+m_{1}-n}$ for $n=j_{1}+1,\ldots,j_{1}+m_{1}-1$

[TABLE]

or equivalently, by the definition of $\lambda_{1}$ in (S5),

[TABLE]

or equivalently

[TABLE]

The proof is thus complete because (S9) holds by (S16) and (S17). ∎

Proof of Lemma 3(c).

If $j_{1}+m_{1}<J,$ we first repeatedly utilize Lemma 2(a) and obtain

[TABLE]

Because $\delta_{j_{1},m_{1}}\leq\delta_{j_{1},n}$ for $n=m_{1}+1,\ldots,J-j_{1},$

[TABLE]

Moreover, by (S12) for all $j_{1}=1,\ldots,J-1,$

[TABLE]

In addition, by Lemma 2(c)

[TABLE]

By combining (S19) and (S20), and then repeatedly utilizing Lemma 2(a), we have

[TABLE]

Because $\delta_{j_{1},m_{1}}\leq\delta_{{j_{1}}+{m_{1}},n}$ for $n=1,\ldots,J-j_{1}-m_{1},$

[TABLE]

By the definition of $\lambda_{1}$ in (S5), we can re-write the above inequalities as

[TABLE]

or equivalently

[TABLE]

The proof is thus complete because (S11) holds by (S18) and (S22). ∎

Proof of Theorem 1

We prove Theorem 1 in two steps. First, we show $\gamma_{U}$ is indeed an upper bound. Second, we show the sharpness of $\gamma_{U},$ by constructing a probability matrix $\bm{P}$ attaining it. As mentioned previously, in general there can be multiple probability matrices attaining $\gamma_{U}.$

S3.1 Step 1: Proving the upper bound

For a fixed $j\in\{1,\ldots,J-1\},$

[TABLE]

By switching the labels of treatment and control, we obtain from the above identity that

[TABLE]

Therefore, by the definition of $\gamma$ in (1),

[TABLE]

Below we deal with the three terms in (S3.1), namely $T_{1}-T_{4}$ , $T_{2}$ and $T_{3}$ separately. First,

[TABLE]

Second, for fixed $m\in\{1,\ldots,J-j\},$

[TABLE]

Third,

[TABLE]

Therefore, by (S2) and (S3.1)–(S26) we have proved that $\gamma\leq\delta_{jm}.$

S3.2 Step 2: Proving the sharpness

This step consists of two parts. First, by the definition of $(j_{1},m_{1})$ in (S4) and Lemmas 1–3, we construct a $J\times J$ matrix $\bm{P}=(p_{kl})_{0\leq k,l\leq J-1}.$ Second, we prove that $\bm{P}$ is a well-defined probability matrix attaining the upper bound $\gamma_{U},$ i.e., it has non-negative entries, that its row and column sums are $\bm{p}_{1}$ and $\bm{p}_{0}$ respectively, and that its corresponding relative treatment effect $\gamma$ is indeed $\delta_{j_{1},m_{1}}.$

S3.2.1 Construction of the probability matrix

For initialization, we let $p_{kl}=0$ for all $k,l=0,\ldots,J-1.$ Then, we use Lemma 3 to update certain entries of $\bm{P},$ based on the values of $j_{1}$ and $m_{1}.$

(I)

If $j_{1}>1,$ by (S6) and (S7), we apply Lemma 1(a) to

[TABLE]

and update the sub-matrix $(p_{kl})_{1\leq k\leq j_{1}-1,0\leq l\leq j_{1}-2}$ with non-negative entries such that it remains lower-triangular and satisfies

[TABLE] 2. (II)

If $m_{1}>1,$ by (S8) and (S9), we apply Lemma 1(a) to

[TABLE]

and update the sub-matrix $(p_{kl})_{j_{1}+1\leq k\leq j_{1}+m_{1}-1,j_{1}\leq l\leq j_{1}+m_{1}-2}$ with non-negative entries such that it remains lower-triangular and satisfies

[TABLE] 3. (III)

If $j_{1}+m_{1}<J,$ by (S10) and (S11), we apply Lemma 1(b) to

[TABLE]

and update the sub-matrix $(p_{kl})_{j_{1}+m_{1}\leq k\leq J-1,j_{1}+m_{1}-1\leq l\leq J-2}$ with non-negative entries such that it remains lower-triangular and satisfies

[TABLE]

We further update $\bm{P}$ in the following sequential fashion.

(IV)

Let

[TABLE] 2. (V)

For each $k=j_{1},\ldots,j_{1}+m_{1}-1,$ let

[TABLE] 3. (VI)

Let

[TABLE] 4. (VII)

For all $k=0,\ldots,j_{1}-1$ and $l=j_{1}+m_{1}-1,\ldots,J-1,$ let

[TABLE]

To summarize, our construction procedure is defined by steps (I)—(VII); to be more specific, equations (S27)–(S36). Figure 1 contains a visual illustration of the construction of the probability matrix, where $J=8,$ $j_{1}=3$ and $m_{1}=2.$

S3.2.2 Validation of the probability matrix

Non-negative entries

We verify that all entries of the probability matrix $\bm{P},$ defined by steps (I)–(VII), are non-negative.

All entries defined in steps (I)—(IV) are non-negative by definition. 2. 2.

For entries defined in step (V), i.e., $p_{k,j_{1}-1}$ for all $k=j_{1},\ldots,j_{1}+m_{1}-1,$ we discuss two cases. First, if $m_{1}=1,$ by (S5) and (S33) we have $p_{j_{1},j_{1}}=\max\left(0,p_{j_{1},+}-p_{+,j_{1}-1}\right)\leq p_{j_{1},+},$ which implies that $p_{j_{1},j_{1}-1}\geq 0.$ Second, if $m_{1}>1,$ by (S29) and definitions of the $q_{k+}$ ’s in (S8), $p_{k,j_{1}-1}\geq 0$ for all $k=j_{1},\ldots,j_{1}+m_{1}-2.$ Therefore, we only need to prove that

[TABLE]

This is guaranteed by (S8) and (S29), because

[TABLE] 3. 3.

For $p_{j_{1}-1,j_{1}-1}$ defined in step (VI), by (S5), (S30), (S34) and (S35),

[TABLE] 4. 4.

To prove all entries defined by step (VII) are non-negative, we will prove that

[TABLE]

(a)

First, we prove (S38). By the fact that

[TABLE]

the definitions of $q_{1+},\ldots,q_{j_{1}-1,+}$ in (S6), and (S27), it is straightforward to verify that (S38) holds for all $k\in\{0,\ldots,j_{1}-1\}\backslash\{j_{1}-1\}.$ To further prove that

[TABLE]

we discuss two cases:

i.

If $j_{1}>1,$ by (S6), (S27), and (3),

[TABLE] 2. ii.

If $j_{1}=1,$ by (S5) and (3) we only need to prove

[TABLE]

If $m_{1}=J-1,$ the left side

[TABLE]

If $m_{1}<J-1,$ its equivalent form $\sum_{k=m_{1}+1}^{J-1}p_{k+}\leq\sum_{l=m_{1}}^{J-1}p_{+l}$ holds by (S18). 2. (b)

Second, we prove (S39). By the fact that

[TABLE]

the definitions of $q_{+,j_{1}+m_{1}-1},\ldots,q_{+,J-2}$ in (S10), and (S32), it is straightforward to verify that (S39) holds for $l\in\{j_{1}+m_{1}-1,\ldots,J-1\}\backslash\{j_{1}+m_{1}-1\}.$ To further prove that

[TABLE]

we discuss two cases:

i.

If $j_{1}+m_{1}<J,$ by (S10) and (S32),

[TABLE] 2. ii.

If $j_{1}+m_{1}=J,$ by (S5) and (S33), we only need to prove that

[TABLE]

If $j_{1}=1,$ note that the left side

[TABLE]

If $j_{1}>1,$ its equivalent form $\sum_{l=0}^{j_{1}-2}p_{+l}\leq\sum_{k=0}^{j_{1}-1}p_{k+}$ holds by (S13).

Correct row and column sums

To verify the column and row sums, note that by (S28), (S30), (S35) and (S36), the column sums of $\bm{P}$ are $p_{+0},\ldots,p_{+,J-1},$ respectively. Similarly, by (S31), (S34) and (S36), the row sums of $\bm{P}$ are $p_{0+},\ldots,p_{J-1,+},$ respectively.

The relative treatment effect $\gamma$ of the constructed $\bm{P}$ attains the upper bound $\gamma_{U}$

To prove that the relative treatment effect of $\bm{P}$ is indeed $\delta_{j_{1},m_{1}},$ note that $\bm{P}$ is initialized by all zeros, and that the sub-matrices constructed in steps (I)—(III) are all lower-triangular, which means that:

[TABLE]

By (S28) and (S31)

[TABLE]

By (S34) and the initialization with zeros,

[TABLE]

By (S30) and (S34)–(S36),

[TABLE]

Consequently, by (S40)–(S3.2.2),

[TABLE]

Proof of Corollary 1

By switching the labels of the treatment and control, the relative treatment effect becomes $\gamma^{\prime}=\textrm{pr}\left\{{{Y_{i}}\left(0\right)>{Y_{i}}\left(1\right)}\right\}-\textrm{pr}\left\{{{Y_{i}}\left(0\right)<{Y_{i}}\left(1\right)}\right\}=-\gamma.$ Because $\Delta_{j}^{\prime}=\textrm{pr}\left\{{Y_{i}}\left(0\right)\geq j\right\}-\textrm{pr}\left\{{Y_{i}}\left(1\right)\geq j\right\}=-{\Delta_{j}},$ using (S3) we obtain

[TABLE]

Using Theorem 1, we obtain that the sharp upper bound on $\gamma^{\prime}=-\gamma$ is $\min_{1\leq j\leq J-1}~{}\min_{1\leq m\leq J-j}(-\xi_{jm}),$ which completes the proof.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Agresti, (2010) Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd Edition . Hoboken, New Jersey: John Wiley and Sons.
2Agresti and Kateri, (2017) Agresti, A. and Kateri, M. (2017). Ordinal probability effect measures for group comparisons in multinomial cumulative link models. Biometrics , 73:214–219.
3Balke and Pearl, (1997) Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. J. Am. Statist. Assoc. , 92:1171–1176.
4Cheng, (2009) Cheng, J. (2009). Estimation and inference for the causal effect of receiving treatment on a multinomial outcome. Biometrics , 65:96–103.
5Cheng and Small, (2006) Cheng, J. and Small, D. S. (2006). Bounds on causal effects in three-arm trials with non-compliance. J. R. Statist. Soc. B , 68:815–836.
6Chernozhukov et al., (2013) Chernozhukov, V., Lee, S., and Rosen, A. (2013). Intersection bounds: Estimation and inference. Econometrica , 81:667–737.
7Chiba, (2018) Chiba, Y. (2018). Bayesian inference of causal effects for an ordinal outcome in randomized trials. J. Causal Infer. , 6:1–12.
8Chung and Romano, (2016) Chung, E. and Romano, J. P. (2016). Asymptotically valid and exact permutation tests based on two-sample U-statistics. J. Stat. Plan. Infer. , 168:97–105.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sharp bounds on the relative treatment effect for ordinal outcomes

Abstract

Causal inference with ordinal outcomes

Main results: Sharp bounds on γ\gammaγ

2.1 Notation

2.2 Main theorem, corollaries and remarks

Theorem 1**.**

Corollary 1**.**

Remark 1**.**

Corollary 2**.**

Statistical modeling and inference

3.1 Point estimation

3.2 Sharpening bounds using covariates

3.3 Confidence intervals

Applications

4.1 A randomized experiment

4.2 An observational study

Acknowledgements

Supporting Information

Overview and notation

Lemmas and their proofs

S2.1 Lemma 1 from Lu et al., (2018)

Lemma 1**.**

S2.2 Lemma 2 and its proof

Lemma 2**.**

Proof of Lemma 2(a).

Proof of Lemma 2(b).

Proof of Lemma 2(c).

S2.3 Lemma 3 and its proof

Lemma 3**.**

Proof of Lemma 3(a).

Proof of Lemma 3(b).

Proof of Lemma 3(c).

Proof of Theorem 1

S3.1 Step 1: Proving the upper bound

S3.2 Step 2: Proving the sharpness

S3.2.1 Construction of the probability matrix

S3.2.2 Validation of the probability matrix

Non-negative entries

Correct row and column sums

The relative treatment effect γ\gammaγ of the constructed P\bm{P}P attains the upper bound γU\gamma_{U}γU​

Proof of Corollary 1

Main results: Sharp bounds on $\gamma$

Theorem 1.

Corollary 1.

Remark 1.

Corollary 2.

Lemma 1.

Lemma 2.

Lemma 3.

The relative treatment effect $\gamma$ of the constructed $\bm{P}$ attains the upper bound $\gamma_{U}$