Simulating and reporting frequentist operating characteristics of   clinical trials that borrow external information

Annette Kopp-Schneider; Manuel Wiesenfarth; Leonhard Held; Silvia; Calderazzo

arXiv:2302.12651·stat.ME·February 27, 2023

Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information

Annette Kopp-Schneider, Manuel Wiesenfarth, Leonhard Held, Silvia, Calderazzo

PDF

Open Access

TL;DR

This paper presents a method to evaluate and compare the frequentist operating characteristics, such as type I error and power, of clinical trial analyses that incorporate external data, ensuring fair assessment of borrowing strategies.

Contribution

It introduces a procedure to investigate and report frequentist properties of external data borrowing methods, enabling fair comparison with traditional analyses.

Findings

01

Provides a calibration method for type I error rates with external borrowing

02

Enables fair comparison of power between borrowing and non-borrowing analyses

03

Addresses the impact of external data borrowing on trial error rates

Abstract

Borrowing of information from historical or external data to inform inference in a current trial is an expanding field in the era of precision medicine, where trials are often performed in small patient cohorts for practical or ethical reasons. Many approaches for borrowing from external data have been proposed. Even though these methods are mainly based on Bayesian approaches by incorporating external information into the prior for the current analysis, frequentist operating characteristics of the analysis strategy are of interest. In particular, type I error and power at a prespecified point alternative are in the focus. It is well-known that borrowing from external information may lead to the alteration of type I error rate. We propose a procedure to investigate and report the frequentist operating characteristics in this context. The approach evaluates type I error rate of the test…

Figures16

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Statistical Methods and Inference

Full text

**Supplementary material

**

*Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information

Annette Kopp-Schneider, Manuel Wiesenfarth, Leonhard Held and Silvia Calderazzo*

Extreme borrowing can lead to a test that is not UMP and consequently to power loss

To illustrate that dynamic borrowing can lead to a test that is not UMP, and hence that power loss can be the result of borrowing, we consider an extreme and artificial setting. In case of a one-arm trial with normally distributed endpoint with known variance $\sigma=1$ (i.e., current data $D_{i}\sim N(\theta,1),i=1,...,n$ ), we test $H_{0}:\theta\leq\theta_{0}=0\text{ vs. }H_{1}:\theta>0$ and evaluate power at $\theta_{1}=0.5$ . Sample size for current data is $n=25$ . We use the Empirical Bayes power prior approach to borrow from $n_{E}=1000$ external data $D_{E,j}\sim N(\theta_{E},1),j=1,...,n_{E}$ .

Figures S1 to S11 show the estimated Empirical Bayes weight parameter $\hat{\delta}(d;d_{E})$ (top) and the posterior probability $\textrm{P}(\theta>0\,|\,D=d;D_{E}=d_{E})$ (bottom) for varying current data mean $\bar{d}$ , borrowing from external data with mean $\bar{d}_{E}$ ranging from $0.0$ to $0.21$ . Note that these plots are different from Figure 2 of the main paper where the external mean $\bar{d}_{E}$ is varying on the horizontal axis.

The blue line at $0.975$ in the lower plot indicates the separation between the acceptance and the rejection region of the test: for current observed means with posterior probability $\textrm{P}(\theta>0\,|\,D=d;D_{E}=d_{E})\leq 0.975$ , the tests accepts $H_{0}$ and it rejects if the posterior probability exceeds $0.975$ (cf. equation (4) in the main manuscript). Starting at $\bar{d}_{E}=0.01$ , the posterior probability shows a non-monotone behavior. Up to $\bar{d}_{E}=0.05$ and for $\bar{d}_{E}\geq 0.15$ , this is without consequence for the rejection region of the test since the non-monotonicity occurs in a range of values with all $\textrm{P}(\theta>0\,|\,D=d;D_{E}=d_{E})\leq 0.975$ (for $\bar{d}_{E}\leq 0.05$ ) or with all $\textrm{P}(\theta>0\,|\,D=d;D_{E}=d_{E})>0.975$ (for $\bar{d}_{E}\geq 0.15$ ). For values of $\bar{d}_{E}$ between $0.06$ and $0.14$ , shown in Figures S4 to S8, the rejection region is no longer a single interval, but separated into two intervals. In comparison to the UMP test for the one-sided one-arm situation (calibrated to borrowing from $\bar{d}_{E}$ ), this leads to a power loss, as observed in Figure 2c in the main manuscript.

Each lower plot also shows the integral over the current data, in red the value of $\alpha(d_{E})=\textrm{E}_{\theta=0}[\varphi_{\text{B}}(D;d_{E})]$ and in green the power with borrowing, powerwEB $=\textrm{E}_{\theta_{1}=0.5}[\varphi_{\text{B}}(D;d_{E})]$ . These numbers are points on the red and on the green line, respectively, in Figure 2c in the main manuscript.