Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information
Annette Kopp-Schneider, Manuel Wiesenfarth, Leonhard Held, Silvia, Calderazzo

TL;DR
This paper presents a method to evaluate and compare the frequentist operating characteristics, such as type I error and power, of clinical trial analyses that incorporate external data, ensuring fair assessment of borrowing strategies.
Contribution
It introduces a procedure to investigate and report frequentist properties of external data borrowing methods, enabling fair comparison with traditional analyses.
Findings
Provides a calibration method for type I error rates with external borrowing
Enables fair comparison of power between borrowing and non-borrowing analyses
Addresses the impact of external data borrowing on trial error rates
Abstract
Borrowing of information from historical or external data to inform inference in a current trial is an expanding field in the era of precision medicine, where trials are often performed in small patient cohorts for practical or ethical reasons. Many approaches for borrowing from external data have been proposed. Even though these methods are mainly based on Bayesian approaches by incorporating external information into the prior for the current analysis, frequentist operating characteristics of the analysis strategy are of interest. In particular, type I error and power at a prespecified point alternative are in the focus. It is well-known that borrowing from external information may lead to the alteration of type I error rate. We propose a procedure to investigate and report the frequentist operating characteristics in this context. The approach evaluates type I error rate of the test…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Statistical Methods and Inference
**Supplementary material
**
*Simulating and reporting frequentist operating characteristics of clinical trials that borrow external information
Annette Kopp-Schneider, Manuel Wiesenfarth, Leonhard Held and Silvia Calderazzo*
Extreme borrowing can lead to a test that is not UMP and consequently to power loss
To illustrate that dynamic borrowing can lead to a test that is not UMP, and hence that power loss can be the result of borrowing, we consider an extreme and artificial setting. In case of a one-arm trial with normally distributed endpoint with known variance (i.e., current data ), we test and evaluate power at . Sample size for current data is . We use the Empirical Bayes power prior approach to borrow from external data .
Figures S1 to S11 show the estimated Empirical Bayes weight parameter (top) and the posterior probability (bottom) for varying current data mean , borrowing from external data with mean ranging from to . Note that these plots are different from Figure 2 of the main paper where the external mean is varying on the horizontal axis.
The blue line at in the lower plot indicates the separation between the acceptance and the rejection region of the test: for current observed means with posterior probability , the tests accepts and it rejects if the posterior probability exceeds (cf. equation (4) in the main manuscript). Starting at , the posterior probability shows a non-monotone behavior. Up to and for , this is without consequence for the rejection region of the test since the non-monotonicity occurs in a range of values with all (for ) or with all (for ). For values of between and , shown in Figures S4 to S8, the rejection region is no longer a single interval, but separated into two intervals. In comparison to the UMP test for the one-sided one-arm situation (calibrated to borrowing from ), this leads to a power loss, as observed in Figure 2c in the main manuscript.
Each lower plot also shows the integral over the current data, in red the value of and in green the power with borrowing, powerwEB . These numbers are points on the red and on the green line, respectively, in Figure 2c in the main manuscript.
