A Bayesian Treatment Selection Design for Phase II Randomised Cancer Clinical Trials
Moka Komaki, Satoru Shinoda, Haiyan Zheng, Kouji Yamamoto

TL;DR
This paper introduces a Bayesian method for selecting the best cancer treatment in Phase II trials, offering flexibility and transparency compared to traditional frequentist approaches.
Contribution
A novel Bayesian design for treatment selection in Phase II trials with binary outcomes and sample size determination methods in a Bayesian framework.
Findings
The Bayesian design outperforms frequentist methods in handling ambiguous endpoints and sample size constraints.
Simulation studies and real-data applications validate the effectiveness of the proposed treatment selection approach.
An R Shiny application is developed to support clinicians in implementing the Bayesian design.
Abstract
It is crucial to design Phase II cancer clinical trials that balance the efficiency of treatment selection with clinical practicality. Sargent and Goldberg proposed a frequentist design that allows decision‐making even when the primary endpoint is ambiguous. However, frequentist approaches rely on fixed thresholds and long‐run frequency properties, which can limit flexibility in practical applications. In contrast, the Bayesian decision rule, based on posterior probabilities, enables transparent decision‐making by incorporating prior knowledge and updating beliefs with new data, addressing some of the inherent limitations of frequentist designs. In this study, we propose a novel Bayesian design, allowing the selection of the best‐performing treatment. Specifically, concerning phase II clinical trials with a binary outcome, our decision rule employs posterior interval probability by…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3| Response rates | ||||||||
|---|---|---|---|---|---|---|---|---|
| Vague prior | (1, 1) | (1, 1) | 0.20 | 0.05 | 53 | 33 | 33 | 13 |
| 0.25 | 0.10 | 67 | 30 | 38 | 19 | |||
| 0.30 | 0.15 | 72 | 39 | 39 | 19 | |||
| 0.35 | 0.20 | 79 | 39 | 45 | 19 | |||
| 0.40 | 0.25 | 87 | 47 | 52 | 17 | |||
| 0.45 | 0.30 | 93 | 46 | 53 | 26 | |||
| 0.50 | 0.35 | 94 | 54 | 54 | 26 | |||
| Informative prior | (2, 8) | (1, 9) | 0.20 | 0.05 | 38 | 18 | 18 | 13 |
| (Incorporate appropriate prior information for each group of 10 participants.) | (3, 7) | (1, 9) | 0.25 | 0.10 | 30 | 10− | 11 | 10− |
| (3, 7) | (2, 8) | 0.30 | 0.15 | 65 | 32 | 39 | 12 | |
| (4, 6) | (2, 8) | 0.35 | 0.20 | 50 | 19 | 25 | 10− | |
| (4, 6) | (3, 7) | 0.40 | 0.25 | 87 | 39 | 47 | 12 | |
| (5, 5) | (3, 7) | 0.45 | 0.30 | 66 | 26 | 33 | 10− | |
| (5, 5) | (4, 6) | 0.50 | 0.35 | 94 | 46 | 54 | 18 | |
| Response rates | ||||||||
|---|---|---|---|---|---|---|---|---|
| Vague prior | (1, 1) | (1, 1) | 0.20 | 0.05 | 71 | 34 | 40 | 17 |
| 0.25 | 0.10 | 94 | 43 | 52 | 21 | |||
| 0.30 | 0.15 | 115 | 50 | 65 | 25 | |||
| 0.35 | 0.20 | 131 | 59 | 72 | 28 | |||
| 0.40 | 0.25 | 145 | 64 | 79 | 31 | |||
| 0.45 | 0.30 | 155 | 68 | 85 | 33 | |||
| 0.50 | 0.35 | 161 | 71 | 90 | 34 | |||
| Informative prior | (2, 8) | (1, 9) | 0.20 | 0.05 | 60 | 24 | 30 | 10− |
| (Incorporate appropriate prior information for each group of 10 participants.) | (3, 7) | (1, 9) | 0.25 | 0.10 | 63 | 10− | 22 | 10− |
| (3, 7) | (2, 8) | 0.30 | 0.15 | 106 | 43 | 54 | 15 | |
| (4, 6) | (2, 8) | 0.35 | 0.20 | 102 | 26 | 45 | 10− | |
| (4, 6) | (3, 7) | 0.40 | 0.25 | 135 | 37 | 71 | 21 | |
| (5, 5) | (3, 7) | 0.45 | 0.30 | 125 | 37 | 58 | 10− | |
| (5, 5) | (4, 6) | 0.50 | 0.35 | 153 | 62 | 80 | 25 | |
| Scenario 1.1 | 0.30 | 0.15 | 0.05 | 0.5 | (1, 1) | (1, 1) | 54.6 | 68.4 | ||
| Scenario 1.2 | 0.30 | 0.15 | 0.05 | 0.5 | 0.30 | 0.15 | (3, 7) | (2, 8) | 58.2 | 70.7 |
| (6, 14) | (3, 17) | 71.6 | 80.3 | |||||||
| (9, 21) | (5, 25) | 73.8 | 81.7 | |||||||
| (12, 28) | (6, 34) | 82.4 | 87.2 | |||||||
| (15, 35) | (8, 42) | 83.4 | 88.7 | |||||||
| Scenario 1.3 | 0.30 | 0.15 | 0.05 | 0.5 | 0.30 | 0.20 | (3, 7) | (2, 8) | 58.2 | 70.7 |
| (6, 14) | (4, 16) | 60.5 | 73.1 | |||||||
| (9, 21) | (6, 24) | 63.6 | 75.3 | |||||||
| (12, 28) | (8, 32) | 64.5 | 75.8 | |||||||
| (15, 35) | (10, 40) | 64.9 | 77.2 | |||||||
| Scenario 1.4 | 0.30 | 0.15 | 0.05 | 0.5 | 0.30 | 0.10 | (3, 7) | (1, 9) | 69.6 | 78.2 |
| (6, 14) | (2, 18) | 81.0 | 85.9 | |||||||
| (9, 21) | (3, 27) | 88.7 | 91.0 | |||||||
| (12, 28) | (4, 36) | 93.2 | 94.4 | |||||||
| (15, 35) | (5, 45) | 96.9 | 97.2 | |||||||
| Scenario 1.5 | 0.30 | 0.15 | 0.05 | 0.5 | 0.50 | 0.35 | (5, 5) | (4, 6) | 54.7 | 69.1 |
| (10, 10) | (7, 13) | 64.9 | 76.4 | |||||||
| (15, 15) | (11, 19) | 64.9 | 77.2 | |||||||
| (20, 20) | (14, 26) | 76.1 | 85.3 | |||||||
| (25, 25) | (18, 32) | 81.0 | 86.9 |
| Scenario 2.1 | 0.30 | 0.30 | 0.05 | 0.5 | (1, 1) | (1, 1) | 92.4 | 93.9 | ||
| Scenario 2.2 | 0.30 | 0.30 | 0.05 | 0.5 | 0.30 | 0.20 | (3, 7) | (2, 8) | 91.2 | 92.4 |
| (6, 14) | (4, 16) | 90.9 | 92.2 | |||||||
| (9, 21) | (6, 24) | 89.6 | 91.5 | |||||||
| (12, 28) | (8, 32) | 87.3 | 90.0 | |||||||
| (15, 35) | (10, 40) | 86.8 | 89.2 | |||||||
| Scenario 2.3 | 0.30 | 0.30 | 0.05 | 0.5 | 0.30 | 0.10 | (3, 7) | (1, 9) | 86.4 | 89.2 |
| (6, 14) | (2, 18) | 79.1 | 84.8 | |||||||
| (9, 21) | (3, 27) | 67.1 | 77.6 | |||||||
| (12, 28) | (4, 36) | 55.0 | 68.6 | |||||||
| (15, 35) | (5, 45) | 44.7 | 61.0 |
|
| |||
|---|---|---|---|
| 0.55 | 0.40 | 0.81 |
|
| Required sample size per group | ||
|---|---|---|---|
| 0.55 | 0.40 | 40 |
- —JST SPRING
- —Cancer Research UK10.13039/501100000289
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Cancer Genomics and Diagnostics
Introduction
1
Phase II clinical trials serve as a foundation for the development of pivotal, large‐scape phase III clinical trials. The typical aims include screening new drugs for antitumor activity, exploring new combinations of therapies, and testing new treatment schedules [1]. There is a consensus that randomised controlled trials (RCTs) are the “gold standard” to establish early treatment efficacy in phase II clinical trials [2]. In exploratory phase II settings, RCTs are considered particularly appropriate when the use of historical data is limited, such as when ethnic differences exist in the patient population or when specific disease subtypes are to be focused. Another major advantage of randomisation is that it balances both known and unknown prognostic factors, thereby providing a proper framework for causal inference [1, 3]. Regulatory authorities have also emphasised the importance of RCTs in their guidelines. For instance, the Food and Drug Administration (FDA) issued draft guidance in 2023 on clinical trial considerations to support the accelerated approval of oncology therapeutics, highlighting the potential advantages of RCTs over single‐arm trials wherein all patients are treated with the new treatment [4]. Similarly, the European Medicines Agency (EMA) has underscored the significance of phase II RCTs [5].
In the design and analysis of phase II clinical trials, Bayesian statistical methods are increasingly utilized due to their ability to incorporate all relevant information, leading to improved parameter estimation. These methods enhance clinical trial efficiency by reducing trial duration, minimizing participant burden, optimizing sample size requirements, and leveraging prior information for more effective data utilization [6, 7]. This is particularly desirable for clinical trials that have limited information to generate [8]. As a result, Bayesian statistical designs are gradually being adopted, particularly in early phases of drug development, such as phase II clinical trials [6, 9, 10]. Moreover, Bayesian designs enable coherent inference and decision‐making using probabilities, making interpretation easier [11]. As a result, Bayesian designs are currently significant topics of interest in phase II cancer clinical trials, reflecting their potential to enhance the development of effective cancer therapies.
Sargent and Goldberg's design (hereafter referred to as SG design) [12, 13] is a type of randomised treatment selection design, offering flexibility by allowing the consideration of other factors when the primary endpoint alone is insufficient to determine whether a treatment should proceed to phase III clinical trials. Generally, when a randomised design is conducted with proper error control, a sample size of several hundred patients may be required in phase II RCTs [14], which may sometimes not be feasible when developing drugs. In practice, there are also randomised phase II trials conducted with a total sample size of fewer than 100 patients [15]. Given the growing demand for the use of randomisation, Bayesian designs permitting the selection of a treatment that has a higher efficacy and/or good practicality offer a promising solution for addressing clinical needs [11].
When the drug cannot be selected based on the primary endpoint, the decision yielded by the SG design will be made using secondary factors—including toxicity, cost, ease of administration, quality of life, and so forth—which are also recommended for consideration in the guidelines [16]. In some cases, a treatment with a slightly lower response rate may be preferred over one with a higher response rate due to other factors. This design reflects the clinical reality that the success probability of a treatment is only one of many considerations when recommending a treatment for a particular patient. One key feature of the SG design is the inclusion of an ambiguity probability, which allows for a slight relaxation of the decision criterion focused on superiority alone. This design has been employed in various oncology studies, including the trial on pertuzumab and trastuzumab with or without metronomic chemotherapy for older patients with HER2‐positive metastatic breast cancer (EORTC 75111‐10114) [15], the phase II RCTs of gefitinib or placebo in combination with tamoxifen in patients with hormone receptor‐positive metastatic breast cancer [17], and the randomised phase II study for advanced endometrial carcinoma in the Adjuvant Chemotherapy for Endometrial Cancer (ACE) trial [18]. The SG design is not intended to allow for interim monitoring. The reason is that this design is used in cases where a significant difference in the primary endpoint is not observed. In addition, when drug selection cannot be made based on the primary endpoint, the design relies on secondary factors for treatment selection, which requires following up with all patients until the last one is evaluated. The present work hence does not consider interim adaptation in the selection design. Instead, we aim to develop a Bayesian analogue of treatment selection designs using the SG design as a primary example.
While phase II trials are primarily small‐scale exploratory studies, SG designs (RCTs) typically require large sample sizes due to the inclusion of a control group, raising feasibility concerns. Replacing the control group with historical cohort data could introduce risks such as selection bias, observation bias, and confounding bias. Therefore, RCTs that account for sample size constraints are generally preferred. In practice, Bayesian designs have limited uptake due to low awareness among clinicians. A recent survey by the Drug Information Association Bayesian Scientific Working Group (DIA BSWG) uncovered that insufficient knowledge of Bayesian approaches is perceived as the greatest barrier to implementing Bayesian methods [19].
To address these issues, we propose a Bayesian treatment selection design in Section 2 that, compared to the SG design, reduces the required sample size by leveraging prior information. A Bayesian approach to the sample size calculation, drawing inspiration from the SG design, is presented in Section 3. To improve the uptake of our Bayesian design in practice, we develop a web application using R Shiny. We expect this would particularly benefit practioners like clinicians for designing their phase II RCTs. Our proposed methodology is thoroughly evaluated using a simulation study in Section 4 and a case study in Section 5. We then conclude with a discussion in Section 7.
A Novel Bayesian Treatment Selection Design
2
Consider phase II RCTs comparing two treatments, labelled A and B, where the primary endpoint is a binary outcome, that is, response or no response. We are interested in estimating the response rates. We shall consider Bayesian methods for updating our beliefs based on data generated from the phase II RCT.
Let the number of patients in each treatment group be ni for i=A or B. Without loss of generality, let πA denote the response rate on the better treatment (say, A) and πB denote the response rate on the poorer treatment (say, B).
In Bayesian inference, πA and πB have a prior distribution each. We assume specifically
The posterior distribution for the true response rates can then be expressed as
The SG design makes decisions based on the difference observed between πA and πB in a clinical trial. In contrast, sample size determination is based on key metrics such as the probability of correct selection (PCorr) and the probability of ambiguity (PAmb). A Bayesian analogue to this sample size determination can be constructed by applying Beta posterior distributions to define PCorr and PAmb. In what follows, we integrate these criteria into the decision‐making process to establish a Bayesian framework.
Drawing inspiration from Thall and Simon [20], we give the probability of correct selection, denoted by PCorr∗, as the posterior probability of the difference in response rates being greater than d. Here, d is a known quantity representing the clinically meaningful difference. Mathematically,
where f(·;α,β) and F(·;α,β) are the probability density function (pdf) and cumulative distribution function (cdf) of a Beta(α,β) distribution, respectively [21].
The probability of ambiguous selection also following SG design, denoted as PAmb∗, is defined as the probability that the difference in true response rates πA−πB is within |d|. It is expressed as follows:
A method for calculating these probabilities is described in Appendix A. Let λ∗=PCorr∗+ρPAmb∗(where0≤ρ≤1) and ρ is selected appropriately for context. Here, ρ represents the probability that a statistically ambiguous outcome will result in the proper arm being selected. The original SG design has set ρ to 12 for two‐group comparisons to determine the sample size [12]. This implicitly assumes that other factors, such as toxicity or costs, are among the key considerations alongside the efficacy assessment. The selection of a regimen under statistical ambiguity coupled with 12 is equivalent to a fair coin toss. Alternatively, one may set ρ=0. While likely to be overly conservative in trial planning. On the other hand, setting ρ=1, this decision criterion becomes equivalent to λ∗=Pr(πA−πB≥−d), which may form a Bayesian non‐inferiority design for a given margin of d. In brevity, the decision rule considered for this design is defined as follows:
Here, 0<θ<1 is the threshold which need to be determined in advance. Furthermore, θ may be set to a large fraction like 0.80 or 0.90 so a treatment is selected based on compelling evidence of clinical benefit or acceptability in favour of the decision.
The Sample Size Calculation
3
We now demonstrate the Bayesian sample size determination for the setting considered, wherein a treatment A or B is selected based on λ∗ defined in Section 2. When concerned with ρ=0, the sample size required is found to ensure the probability of correct selection of a treatment that has superior efficacy than the alternative.
Aligned with the proposal by Thall and Simon [20, 22], we proposed the following that requires the expected response rates π˜i for each group. We assume ñA=ñB=ñ, where ñ is a positive integer. Let the number of responders be x˜i=ñ×π˜i. Since the number of responders is likewise an integer, x˜i is rounded up to the nearest integer. For example, when ñ=30 and π˜A=0.25, x˜A is calculated to be x˜A=8.
The posterior distributions for correct selection and ambiguity are derived from Equation (1) and Equation (2) of Appendix A and formulated accordingly. Subsequently, λ∗ is computed based on this distribution. The minimum sample size ñmin is defined as the smallest ñ such that λ∗>γ∗ always holds, where γ∗ is a pre‐specified threshold. Using the algorithm described above, the results of sample size for each group based on λ∗ are as follows. Table 1 gives the sample size per treatment group for various configurations of (π˜A,π˜B) used for generating the new trial data. For illustration, we set d=0.05 and γ∗=0.80 or γ∗=0.90, assuming ρ=0 or ρ=12 and the use of vague and informative priors, respectively.
The sample size required for case of ρ=12, allowing ambiguous probability for selecting a treatment based on additional factors, is generally smaller compared to that of ρ=0 provided that the threshold γ∗ remains unchanged. This is not limited to the case of ρ=12, but for any ρ>0. Because λ∗ increases as ρ increases. By contrast, if raising the threshold γ∗ while retaining ρ, the required sample size always increases. This is unsurprising, because a larger threshold γ∗ suggests a higher requirement for decision accuracy. Furthermore, as π˜A approaches 0.50, the required sample size increases when the relative difference in the response rates implied by data remains 0.15 in all scenarios (rows of Table 1). The explanation is that the uncertainty in the likelihood increases accordingly with π˜i getting closer to 0.50. It was suggested that, in most cases, incorporating appropriate prior information reduces the required sample size compared to that of cases using a vague prior. Figure 1 visualises the relationship between sample size and λ∗, under the scenario of π˜A=0.30,π˜B=0.15,d=0.05 and ρ=0 with a vague or informative prior. As the proposed method shows that the value of λ∗ is not stable when the sample size is small. In this illustration, the required sample size is defined as the smallest sample size that ensures the resulting λ∗ exceeding γ∗.
Sample size determination for the Bayesian approach to selecting a treatment based on λ∗, assuming π˜A=0.30, π˜B=0.15, d=0.05, ρ=0 with a vague or informative prior.
In the sample size determined based on λ∗, as shown in Figure 1, its instability associated with small sample sizes suggests potential limitations. To address this, we propose an approach that treats x˜i as a random variable rather than deterministically fixing it based on the expected belief response rate, thereby incorporating uncertainty into the sample size design through simulations.
Here, let λj∗ represent the jth value of λ∗. Each xi∗ is randomly sampled from a binomial distribution,
and λj∗ is calculated m times, corresponding to the number of simulations. We define:
and the minimum sample size required is the point where λ‾∗ exceeds the threshold γ∗. Table 2 shows the sample size based on λ‾∗. Obviously, the sample sizes are generally larger compared to those based on λ∗, with some exceptions. For instance, looking across Tables 1 and 2, under the scenario of πA=0.20,πB=0.05, with informative priors of Beta(2, 8) and Beta(1, 9), the required sample size is 13 for d=0.05,ρ=12,γ∗=0.80 if applying the λ∗ based rule, whereas it decreases to <10 if applying the λ∗‾ based rule. Figure 2 visualises the sample size determination concerning λ∗‾, assuming π˜A=0.30,π˜B=0.15,d=0.05,ρ=0 and m=100,000, based on λ‾∗, showing more stable values and smoother curves, as compared to Figure 1.
Sample size and λ∗‾ considering the validation of Bayesian approach for a two‐arm trial, assuming π˜A=0.30,π˜B=0.15,d=0.05,ρ=0 and m=100,000.
Simulation Study
4
We perform a simulation study to evaluate the influence of prior information on the proposed sample size calculation, based on λ∗ and λ∗‾. The prior distribution was assumed to be a Beta distribution with parameters (αi,βi), which can be thought of as information from (αi+βi) pseudo observations with the expected response rate of αi/(αi+βi). We evaluated cases where the number of pseudo observations was equivalent to 10, 20, 30, 40, and 50 patients per group, as well as the case of a vague prior, Beta(1,1).
Evaluation of the Proposed Method With Different Response Rates
4.1
We assessed the impact of different specifications of prior distribution on the proposed decision rule, assuming the response rates of π˜A=0.30 and π˜B=0.15. The clinically meaningful difference is set to be d=0.05, with the decision parameters as ρ=0.5 and γ∗=0.90. The required sample sizes are 39 and 65 based on λ∗ and λ‾∗, respectively. In this scenario, the evaluation criterion ξñmin is defined as the proportion of times the treatment with a higher observed response rate is selected, with the mathematical definition given below. In our evaluation, we set the threshold θ=0.90. The number of simulations was set to k=100,000 per scenario. It can be expressed as follows:
Here, 1(λj∗>0.90) is an indicator function that takes the value 1 if λj∗>0.90, and 0 otherwise. This corresponds to the proportion of times our decision rule correctly advances treatment A to the confirmatory phase. Scenarios 1.1 to 1.5 in Table 3 assume a true difference between the response rates, suggesting A is superior to B. Therefore, a higher value of ξñmin is desirable. The sample sizes used for the evaluation were based on two different sample size approaches, corresponding to the minimum required sample sizes of 39 and 65 when the parameters were set to π˜A=0.30, π˜B=0.15, d=0.05, ρ=0.5, and γ∗=0.90. Table 4 evaluates the extent to which our proposed decision rule is influenced by different levels of prior information, assuming a true difference in response rates. In Scenario 1.1, a vague prior distribution is used as the benchmark. Scenario 1.2 presents the results when the prior is perfectly consistent with the true response rates (πpriorA=0.30,πpriorB=0.15). Scenario 1.3 examines the results when the prior suggests some drift in the difference of response rates (πpriorA=0.30,πpriorB=0.20). Scenario 1.4 presents the results when the prior implies a sizable shift in the difference of response rates (πpriorA=0.30,πpriorB=0.10). Finally, in Scenario 1.5, while the difference in true response rates is consistent with what the prior implies, the prior suggests considerable drift in the respective response rates (πpriorA=0.50,πpriorB=0.35).
Scenario 1.2 showed that increasing the prior effective sample size could results in a higher ξñmin because of prior‐data consistency. Scenario 1.3 confirmed that increasing the prior effective sample size does not increase ξñmin when the difference in prior response rates is smaller. Scenario 1.4 was confirmed that increasing the prior effective sample size with a larger difference in response rates results in a higher ξñmin. In Scenario 1.5, though ξñmin decreased by approximately 2% compared to Scenario 1.2, a comparable result was still achieved.
Evaluation of the Proposed Method With Indifferent Response Rates
4.2
We used a different criterion than Scenario 1 to assess whether the probability of selecting treatment A would decrease, assuming the response rates are πA=0.30 and πB=0.30, a clinically meaningful difference of d=0.05 and ρ=0.5. The evaluation criterion for this scenario defines νñmin as the proportion of times selecting the more desirable group was below 90%, that is, λj∗≤θ=0.90. It can be expressed as follows:
This corresponds to the proportion of times our decision rule considered advancing a group based on a secondary factor. In Scenario 2.1 to 2.3 in Table 4, where there is no true difference, a lower νñmin is preferable. As in Section 4.1, the sample sizes used for the evaluation were based on two different sample size approaches, corresponding to the minimum required sample sizes of 39 and 65 when the parameters were set to π˜A=0.30, π˜B=0.15, d=0.05, ρ=0.5, and γ∗=0.90. Table 4 evaluates how our proposed decision rule is influenced by prior data when there is no true difference in response rates. In Scenario 2.1, a vague prior distribution is assumed as the benchmark. Scenario 2.2 presents when prior suggests a difference of 0.10 in response rates (πpriorA=0.30, πpriorB=0.20). Scenario 2.3 presents the results when prior suggests a difference of 0.20 in response rates (πpriorA=0.30, πpriorB=0.10).
Scenario 2.2 illustrates when there is no true difference, yet the investigator assumed a difference and sets a misleading prior distribution. However, regardless of whether ν39 or ν65 is considered, when prior data are fewer than observed data, the proportion of simulations in which the probability of selecting the more desirable group was below 90% remained at approximately 90%. This suggests that even when incorrect prior information is used, the probability of making a decision based on efficacy remains very low. Scenario 2.3 illustrates a case in which there is no true difference, yet the researcher incorrectly assumes a difference and overestimates the treatment effect size compared to Scenario 2.2. As the assumed difference between πprior A and πprior B increases and a stronger prior is incorporated, the proportion of correctly selecting the more desirable group decreases below 90%. In other words, the probability of incorrectly selecting the more desirable group increases.
Real Data Analysis
5
We revisit a phase II RCT [15] in elderly patients with HER2‐positive metastatic breast cancer, and analyse the data retrospectively using the proposed approach. This clinical trial, originally planned based on the SG design, investigates a metronomic chemotherapy with pertuzumab and trastuzumab. The primary endpoint for the two treatment groups, pertuzumab plus trastuzumab (PT) and PT plus metronomic chemotherapy (PTM), was the progression‐free survival rate at six months. The expected progression‐free survival rates were πPT=0.40 and πPTM=0.55, respectively, with a clinically meaningful difference d of 0.10. When the required sample size was 40 per group, which was the same as in the original trial, the value of λ=PCorr+12PAmb was 0.81.
Table 5 shows the results of λ∗ for the required sample size of 40 per group in the SG design (Frequentist framework) and the Bayesian framework. Details about the SG design can be found in Appendix A.
When both πPTM and πPT have a Beta(1, 1) prior each, our proposed design yields λ∗=0.82, which is close to the value of λ in the frequency calculation. In practical clinical trials, sample size calculations may be based on assumed true progression‐free survival rates for each group, derived from previous research results. When incorporating such information into the prior distribution for progression‐free survival rates, the calculated λ∗ increases to 0.86.
Table 6 reports the sample size required using the existing method and the proposed method. In the frequentist calculation, the threshold value of λ is used to calculate the required sample size. When a more informative Beta(26, 40) prior is placed on πPT, each group requires 20 sample size. The results from Tables 5 and 6 confirm a reduction in the required number of cases when prior information is considered.
Shiny Application
6
To achieve our goal of making Bayesian designs more accessible to clinicians, we developed a user‐friendly interface using R Shiny, at https://mokakomaki.shinyapps.io/my_shiny_app/. Figure 3a,b shows the actual Shiny application.
Interface of our R Shiny that implements the proposed Bayesian design. (a) Main interface, (b) analysis results.
The strengths of this software application are as follows:
- By entering the required parameters, the necessary sample size calculation results are output within seconds.
- No statistical programming skills are required.
- Text for protocols and Statistical Analysis Plans (SAPs) is automatically generated.
- It is easy to compute statistical analysis results after the trial ends.
- The simple design ensures readability.
Discussion
7
Our proposed Bayesian design is motivated by the SG design, which incorporates pragmatic considerations from clinical practice. That is, response rate to a treatment is often only one of many considerations in determining the recommendation for further investigation and ultimately to patients [12]. However, it is important to note that the decision rule in our proposed design differs slightly from that of the SG design. In our approach, treatment selection is based on efficacy, as determined by whether the probability of correct selection or the threshold of λ∗ is exceeded. This methodology is directly linked to sample size calculation, resulting in a more straightforward and interpretable design.
Our proposed design can be implemented by two ways of sample size calculation. Choice between these methods depends on the purpose of the clinical trial. The method based on λ∗ is considered to improve the overall success rate of the trial. In this work, we focused on the method based on λ∗, which is expected to be commonly used. As shown in Figure 1, the method based on λ∗ shows that λ∗ becomes unstable when the sample size is small. To address this, we proposed sample size calculations based on λ∗‾, which were found to be more stable even with small sample sizes. This method tends to result in larger required sample sizes compared to the λ∗‐based method, which is unsurprising for additional randomness around the parameters accommodated. On the other hand, particularly when ρ=12 and γ∗=0.80, there are cases where the required sample sizes based on λ∗ and λ∗‾ differ by only a little. It is advisable to calculate both sample sizes based on λ∗ and λ∗‾ in the trial planning. If a resulting sample size is feasible, the calculation using λ∗‾ is recommended. In addition, in the present work, vague prior distributions were employed for the calculation of the required sample size. The necessity for a larger sample size may be reduced if reliable prior information from previously conducted studies can be integrated. When incorporating prior information, it is of the utmost importance to carefully consider whether the prior information is compatible with the trial data. We refer the interested readers to Zheng et al. for robust Bayesian sample size determination by using mixture priors and a discounting parameter [23].
The evaluation was conducted using clinically plausible scenarios in the simulation study. Table 3 assessed how the results are affected when prior data is introduced in cases where the response rates differ by some extent. Incorporating prior data with the same response rates as the true rates increased the probability of selecting treatment A compared to when vague priors were used. Moreover, even when prior data with different response rates but the same difference in response rates between the groups were used, the probability of selecting treatment A increased without substantially altering the decision. These findings suggest that incorporating appropriate prior information increases the likelihood of selecting the desired group when there is a true difference in response rates. Evaluations were also conducted for cases where the difference in prior response rates was smaller or larger than the true response rate difference, and it was found that the probability of selecting treatment A increased compared to the case of vague priors. The simulation suggests that when the relationship between response rates for both treatments is consistent between the prior and the data, the likelihood of selecting the better group is higher. Nevertheless, these simulation results are inherently scenario‐dependent. The conclusions drawn from our evaluation may not be universally generalisable, as different assumptions about response rates, prior distributions, or clinical settings could lead to different operating characteristics. Table 4 assessed how the results are influenced when prior data is introduced in cases where there is no true difference in response rates. When the difference between the true response rates and the prior response rates was around 10%, the impact on the results was not significant. However, when the amount of prior data exceeded that of the observed data, the prior data had a stronger impact on the results, so caution is needed in determining how much prior data to incorporate. We note that dynamically handling prior‐data conflict in early phase clinical trials [24] is beyond the scope of this paper. This could be a research avenue to pursue in the future.
The definition of the second factor, which forms the basis of our proposed Bayesian design as well as the SG design, is not strictly fixed. For example, in cases where no difference is observed between the response rates, the decision to select a treatment for the confirmatory phase III clinical trial may be based on clinical judgment, which raises concerns about potentially arbitrary outcomes. In the field of oncology, safety is likely the second most important factor after efficacy, and it is desirable to define and quantify safety as the second factor. On the horizon of precision medicine, we are interested in extending our Bayesian design to enable simultaneous selection of treatment in multiple patient subgroups [25]. There may also be a need for our design to remain exploratory, flexible, and adaptable.
In conclusion, this paper proposed a Bayesian treatment selection design as an analogue of that proposed by Sargent and Goldberg. The Bayesian version enables incorporation of external‐trial information into the prior distributions, which can mean a reduction in the required sample size. The associated sample size calculation can be performed using our easy‐to‐implement web application. Our proposed methodology can be extended to involve sequential decision making as well as to consider heterogeneous patient populations.
Funding
This work was supported by JST SPRING (Grant No. JPMJSP2179) and Cancer Research UK (Grant No. RCCCDF‐May24/100001).
Conflicts of Interest
The authors declare no conflicts of interest.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1S. Green and J. Benedetti , Clinical Trials in Oncology, 3rd ed. (Routledge, 2012).
- 2M. J. Grayling , M. Dimairo , A. P. Mander , and T. F. Jaki , “A Review of Perspectives on the Use of Randomization in Phase II Oncology Trials,” Journal of the National Cancer Institute 111, no. 12 (2019): 1255–1262, 10.1093/jnci/djz 126.31218346 PMC 6910171 · doi ↗ · pubmed ↗
- 3L. V. Rubinstein , E. L. Korn , B. Freidlin , S. Hunsberger , S. P. Ivy , and M. A. Smith , “Design Issues of Randomized Phase II Trials and a Proposal for Phase II Screening Trials,” Journal of Clinical Oncology 23, no. 28 (2005): 7199–7206, 10.1200/JCO.2005.01.149.16192604 · doi ↗ · pubmed ↗
- 4Food and Drug Administration , “Clinical Trial Considerations to Support Accelerated Approval of Oncology Therapeutics Guidance for Industry,” 2023, https://www.fda.gov/media/166431/download.
- 5European Medicines Agency , “Guideline on the Clinical Evaluation of Anticancer Medicinal Products ‐ Revision 6,” 2023, https://www.ema.europa.eu/en/documents/scientific‐guideline/guideline‐clinical‐evaluation‐anticancer‐medicinal‐products‐revision‐6_en.pdf.
- 6M. Fors and P. González , “Current Status of Bayesian Clinical Trials for Oncology, 2020,” Contemporary Clinical Trials Communications 20 (2020): 100658, 10.1016/j.conctc.2020.100658.33083629 PMC 7554365 · doi ↗ · pubmed ↗
- 7H. Mo , Y. Yu , X. Sun , et al., “Metronomic Chemotherapy Plus Anti‐PD‐1 in Metastatic Breast Cancer: A Bayesian Adaptive Randomized Phase 2 Trial,” Nature Medicine 30, no. 9 (2024): 2528–2539, 10.1038/s 41591-024-03088-2.38969879 · doi ↗ · pubmed ↗
- 8D. A. Berry , “Bayesian Clinical Trials,” Nature Reviews Drug Discovery 5, no. 1 (2006): 27–36, 10.1038/nrd 1927.16485344 · doi ↗ · pubmed ↗
