Improving phase II oncology trials using best observed RECIST response as an endpoint by modelling continuous tumour measurements
Chien-Ju Lin, James Wason

TL;DR
This paper introduces an extension of the augmented binary method to improve the analysis of best observed RECIST responses in phase II oncology trials, significantly increasing statistical power by modeling continuous tumor measurements.
Contribution
The paper develops a novel statistical approach that extends existing methods to better utilize best observed responses, enhancing power in phase II cancer trial analyses.
Findings
Method improves power by approximately 35% over traditional analysis.
Simulation and real data demonstrate increased efficiency in single-arm and randomized trials.
Modified version reduces computational effort while maintaining efficiency.
Abstract
In many phase II trials in solid tumours, patients are assessed using endpoints based on the Response Evaluation Criteria in Solid Tumours (RECIST) scale. Often, analyses are based on the response rate. This is the proportion of patients who have an observed tumour shrinkage above a pre-defined level and no new tumour lesions. The augmented binary method has been proposed to improve the precision of the estimator of the response rate. The method involves modelling the tumour shrinkage to avoid dichotomising it. However, in many trials the best observed response is used as the primary outcome. In such trials, patients are followed until progression, and their best observed RECIST outcome is used as the primary endpoint. In this paper, we propose a method that extends the augmented binary method so that it can be used when the outcome is best observed response. We show through simulated…
| Scenario | Mean of estimated probability | Estimated coverage | Reduction in width of 95% CI(%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Time | True | Bin | Augbin/ | mAug | Bin | Augbin/ | mAug | Augbin/ | mAug | |
| eAugbin | eAugbin | eAugbin | ||||||||
| (-1.5, 0) | 2 | 0.334 | 0.333 | 0.332 | 0.338 | 0.957 | 0.947 | 0.947 | 15.68 | 14.75 |
| (-2.5, 0.2) | 2 | 0.293 | 0.293 | 0.293 | 0.286 | 0.948 | 0.945 | 0.941 | 13.45 | 11.94 |
| (-1.5, 0) | 3 | 0.318 | 0.316 | 0.314 | 0.317 | 0.953 | 0.936 | 0.949 | 12.53 | 13.26 |
| (-2.5, 0.2) | 3 | 0.450 | 0.444 | 0.443 | 0.443 | 0.954 | 0.943 | 0.948 | 14.5 | 15.28 |
| (-1.5, 0) | 4 | 0.270 | 0.268 | 0.263 | 0.268 | 0.949 | 0.926 | 0.95 | 12.67 | 12.54 |
| (-2.5, 0.2) | 4 | 0.429 | 0.422 | 0.421 | 0.421 | 0.957 | 0.938 | 0.943 | 13.14 | 14.41 |
| Mean of estimated probability | Estimated coverage | Reduction in width of 95% CI(%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n | Time | True | Bin | eAugbin | mAug | Bin | eAugbin | mAug | eAugbin | mAug | |
| (-1.5, 0) | 75 | 4 | 0.4 | 0.4 | 0.404 | 0.403 | 0.959 | 0.954 | 0.955 | 16.5 | 15.9 |
| (-1.5, 0) | 75 | 5 | 0.391 | 0.393 | 0.398 | 0.396 | 0.943 | 0.951 | 0.952 | 16.6 | 15.9 |
| (-1.5, 0) | 75 | 6 | 0.386 | 0.39 | 0.395 | 0.394 | 0.959 | 0.954 | 0.955 | 16.7 | 16 |
| (-1.5, 0) | 150 | 7 | 0.382 | 0.382 | — | 0.387 | 0.944 | — | 0.957 | — | 16.6 |
| (-2.5, 0.2) | 75 | 4 | 0.46 | 0.457 | 0.462 | 0.461 | 0.941 | 0.957 | 0.957 | 16.8 | 17.3 |
| (-2.5, 0.2) | 75 | 5 | 0.452 | 0.448 | 0.454 | 0.452 | 0.954 | 0.96 | 0.96 | 18.2 | 17.2 |
| (-2.5, 0.2) | 75 | 6 | 0.446 | 0.442 | 0.449 | 0.447 | 0.942 | 0.962 | 0.961 | 18.3 | 17.2 |
| (-2.5, 0.2) | 150 | 7 | 0.441 | 0.441 | — | 0.446 | 0.95 | — | 0.96 | — | 18.3 |
| Placebo | Cediranib 20 mg | Cediranib 30 mg \bigstrut | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method Time | 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 |
| Bin | 0.113 | 0.114 | 0.111 | 0.105 | 0.111 | 0.112 | 0.111 | 0.111 | 0.134 | 0.133 | 0.13 | 0.124 |
| eAugbin | 0.073 | 0.074 | 0.073 | 0.07 | 0.072 | 0.075 | 0.074 | 0.064 | 0.088 | 0.088 | 0.088 | 0.085 |
| mAug | 0.086 | 0.087 | 0.087 | 0.08 | 0.086 | 0.088 | 0.086 | 0.088 | 0.105 | 0.105 | 0.104 | 0.096 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Cancer Genomics and Diagnostics · Mathematical Biology Tumor Growth
Improving phase II oncology trials using best observed RECIST response as an endpoint by modelling continuous tumour measurements
Chien-Ju Lin1,*, James M.S. Wason1,
**1 MRC Biostatistics Unit, Cambridge, UK
Abstract
In many phase II trials in solid tumours, patients are assessed using endpoints based on the Response Evaluation Criteria in Solid Tumours (RECIST) scale. Often, analyses are based on the response rate. This is the proportion of patients who have an observed tumour shrinkage above a pre-defined level and no new tumour lesions. The augmented binary method has been proposed to improve the precision of the estimator of the response rate. The method involves modelling the tumour shrinkage to avoid dichotomising it. However, in many trials the best observed response is used as the primary outcome. In such trials, patients are followed until progression, and their best observed RECIST outcome is used as the primary endpoint. In this paper, we propose a method that extends the augmented binary method so that it can be used when the outcome is best observed response. We show through simulated data and data from a real phase II cancer trial that this method improves power in both single-arm and randomised trials. The average gain in power compared to the traditional analysis is equivalent to approximately a 35 increase in sample size. A modified version of the method is proposed to reduce the computational effort required. We show this modified method maintains much of the efficiency advantages.
1 Introduction
A new cancer treatment is tested for potential benefit in phase II trials that use a relatively small number of patients followed over a short period of time. The results of the phase II trial determines whether to test the treatment in a larger, more time-consuming, and more costly phase III trial. Because of the high cost of, and high failure rate in, phase III oncology trials [8], it is important to improve the analysis of Phase II trials to ensure the decision is more accurate.
Phase II oncology trials use a variety of endpoints to evaluate the efficacy of a treatment [6, 9]. The most commonly used endpoints are based on the Response Evaluation Criteria in Solid Tumours (RECIST) scale [2]. RECIST defines tumour size as the sum of longest diameters of target lesions and categorises patients into complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). CR and PR represent no new tumour lesions and a 100 shrinkage and greater than 30 shrinkage, respectively; PD represents a 20% increase in tumour size from the minimum size observed up to that point, or new lesions appearing. Often patients are followed until they are categorised as PD or a preplanned time, and patients with CR or PR are labelled responders. The response rate is defined as either: 1) proportion of patients who are responders at a certain time after baseline (fixed response); or 2) the proportion of patients whose best observed response before progression is CR or PR (best observed response, BOR).
Categorizing patients into responders and non-responders is widespread and clinically appealing. However, it can have substantial statistical disadvantages. Its major limitation is that it dichotomises the continuous tumour variable, thus discarding information. This loses substantial efficiency [1]. Some researchers have addressed the problem and proposed methods to make use of the continuous change in tumour response to improve statistical efficiency. This is done in different ways. Karrison et al. [7] propose directly using the change in tumour size as an endpoint. Wason and Seaman [12] use models for the tumour size and new lesion data which can be used to infer the fixed response rate with higher precision. Jaki et al. [5] propose a method that links tumour size change with mortality using historical datasets. Authors have demonstrated that using continuous scales can increase the power (or reduce the required sample for a target power) compared to analysing the binary composite outcome.
The method of Wason and Seaman [12] retains the clinically meaningful endpoint but takes into account the continuous information on tumour size. This method is limited by only allowing two follow-up visits, and only considering response rate at not only a fixed time (i.e. it cannot be used to make inferences on BOR). In trials that patients are assessed two times (interim and final), their method is sufficient. However, in trials that patients are followed up until a pre-planned time, a method incorporates information on all measurement data is preferred. In this paper we present an extended method that can be used for any number of follow-up times for fixed response or BOR. We propose a modified version that uses a highly efficient technique for multivariate integration [3], which substantially reduces the computation time taken. We assess the properties of the proposed methods by using simulated data and data from a real phase II cancer trial (HORIZON II).
This paper is divided into four sections. Section 2 gives a brief overview of the augmented binary method [12]. It then describes the proposed extensions of the method. Section 3 evaluates the performance of the proposed methods using simulations and real data. Section 4 summarizes the results and presents limitations and future work.
2 Methods
2.1 Background
We use the phrase ’tumour size’ as shorthand for the sum of the longest diameter of target tumour lesions. We assume patients’ tumour sizes are recorded until progression occurs or until a preplanned number of visits. We note there are two ways in which a progression can occur: an increase in tumour size by more than 20 % (a tumour-growth progression) or new lesions appearing (a new-lesion progression). Two response endpoints can be used in the analysis, one being fixed time and the other being BOR. Analysis at a fixed time uses the proportion of responders at time (those who have a tumour size shrinkage at time above a pre-defined threshold and no progression up to that point). BOR defines patients as a responder or not according to their best observed response before progression. The latest RECIST guidelines [2] give BOR two definitions according to whether confirmation is required or not. Confirmation means that an apparent response must be backed up by continued response at the next timepoint to be counted as genuine. This is especially recommended for single-arm trials. When confirmation is not required (randomised trials comparing two arms), BOR is defined as the best response across all time points up to progression. When confirmation is required, BOR is defined as a response if the patient is a responder at two consecutive time points before progression.
2.2 Notation
Tumour sizes for each patient are measured at several discrete times ( denoting the maximum time). The tumour size at time for patient is denoted by where represents the baseline measurement. We denote and as the time at which a tumour-growth progression and new-lesion progression occurs, respectively. Once a patient progresses they are no longer followed up. The observed data is therefore where =min. We define as the log tumour size ratio for patient at time , , and as the pre-specified dichotomisation threshold for response (on the log tumour ratio scale). Further, defines new-lesion progression indicators: { if patient has a progression due to new lesions occurring between time and , }. We define composite response indicators corresponding to the definitions of fixed time and BOR as follows.
For fixed time at time , the composite response indicator for patient is defined as
[TABLE]
For BOR when confirmation is not required, the event is equivalent to having at least one record classified CR/PR before progression or time , the response indicator is defined as
[TABLE]
We consider the case where confirmation is required later.
2.3 Estimating response probability using the augmented binary method with two follow-up times
The augmented binary method, henceforth referred to as AugBin, was proposed by Wason and Seaman [12]. We briefly describe this method here, but more details are found in [12].
The AugBin method makes assumptions that the log tumour size ratios follow a multivariate normal distribution, and the probability of new-lesion progression depends only on the observed tumour size at the previous visit. The log tumour size ratios are modelled by
[TABLE]
and the new-lesion progression is modelled by using logistic regression models
[TABLE]
[TABLE]
The probability of response for patient is written by:
[TABLE]
where is the vector of parameters from the above models and is the dichotomisation threshold (usually log(0.7), representing at least a 30 shrinkage in the tumour size from baseline). The mean response probability is estimated by , where is the maximum likelihood estimator of . A program is available in the paper which uses R2Cuba to compute the above integration. An approximately (1-) % confidence interval for the probability of response is constructed on the logit scale, that is, expit\bigg{\{}l(\hat{\theta})\pm\Phi^{-1}(1-\frac{\alpha}{2})\sqrt{\text{var}(l(\hat{\theta}))}\bigg{\}}, where and is obtained by using the delta method.
2.4 Extended augmented binary method at a fixed time (t2)
We use the same assumptions and extend the AugBin method to follow-up times. The log tumour size ratios are modelled by
[TABLE]
An unstructured covariance matrix is used (although an alternative form may be needed if is large enough). The new-lesion progression is modelled by
[TABLE]
We assume that the new-lesion progression depends only on the previous observed rumour size. The missing tumour size because of new-lesion progression can be, therefore, treated as MAR as justified in [12]. We assume data before progression is always observed and is always missing, similarly for and . The data is the case of monotone missingness. The probability of response for patient at time can be written by:
[TABLE]
The advantage is that the Equation (3) uses the models to estimate probability of response of patients and missing data are MAR, it can be applied to patients who drop out before preplanned time. The probability is interpreted as the probability of patient being a responder at time T as if they were observed until T. A potential issue of Equation (3) is that the multivariate integration is computationally intensive. The mean response probability is estimated by averaging response probability over patients given . An approximately (1-) % confidence interval is constructed as described in 2.3.
2.5 Modified augmented binary method at a fixed time
The objective for this section is to efficiently estimate the mean response probability using continuous tumour-size information in a computationally efficient way. We assume that {no new-lesion progression occurs from time 1 to time T} and {no tumour-growth progression} are conditionally independent given tumour size . We note this is a strong assumption, and assess the sensitivity to this assumption later on. The probability of response for patient at a fixed time can be written by
[TABLE]
Let be the probability of new-lesion progression at time . Note that is a conditional probability given no new-lesion progression occurring at previous timepoints. The log tumour size ratio is allowed to depend on baseline tumour size whereas new-lesion progression depends on the previous observed tumour size at the previous visit. We can model Y by
[TABLE]
where is a joint distribution, and is the logit link function. We assume that . The probability of response for patient at a fixed time can be written by
[TABLE]
where is the dichotomisation threshold and is a vector of parameters of the models. We assume that is the pdf of a multivariate normal distribution. The multivariate integration can then be calculated by a highly efficient technique proposed by Genz and Bretz [3]. The value of is estimated by using if is observed. For patients who have progressed, their records at time are not observed. Their probability of new-lesion progression at time is estimated by
[TABLE]
where is the number of patients with observed and is the number of patients who have log tumour size ratio outside of the region of integration of Equation (5). We trim those patients to avoid underestimating . This is similar to an idea of trimmed mean, which is used in many areas and has advantages under both normal and non-normal distributions [11, 13].
The vector consists of {(T+1)+ T(T+1)/2 + 2T } parameters (the and parameters from the multivariate normal and parameters from the logistic regression models). The mean response probability is estimated by , where is the maximum likelihood estimator of . A % confidence interval for can be constructed :
[TABLE]
where is the standard normal distribution function. However, we found that the method has better properties if we find a confidence interval for logit and transform back. Let , we obtain by using the delta method, which is written by
[TABLE]
where is the partial derivatives of . An approximately (1-) % confidence interval for the probability of response is
[TABLE]
To summarise, the modified method uses a simplification for the relationship between new-lesion progressions and tumour-growth progressions in order to use a more efficient procedure for multivariate integration.
2.6 Proposed method for best observed response (BOR)
We focus on the case where confirmation is not required but show briefly how the methodology can straightforwardly allow for confirmation later. By the definition of BOR, a patient is a responder if they have at least one log tumour size ratio smaller than log(0.7) before progression or maximum follow-up time. We define , , and as the possible regions of integration corresponding to being classified as stable disease, responder, and irrelevant variables. Let be the time at which the patient is first classified as CR/PR. Hence, each component of will fall into one of the three regions as
[TABLE]
The probability of response using BOR for patient will be the sum over all possibilities of when the CR/PR is first observed. Following the concept of the extended augmented binary approach (eAugbin), the probability of response can be written by :
[TABLE]
Similarly, following the concept of the modified augmented binary approach (mAug), the probability of response can be written by:
[TABLE]
The mean response probability is then estimated by , where is the maximum likelihood estimator of . As before, we work on the logit scale, use the delta method to obtain the variance, and then transform back to construct the confidence interval for the mean response probability.
When confirmation is required, having two continued responses of CR/PR before progression, one can replace (7) with with the sum in (9) going from 1 to .
2.7 Testing a difference in probability of response between two treatments
The above methods can be applied to single-arm trials. For a randomised trial where comparing the difference in response probability is of interest, a minor addition is required.
We assume patients are recruited with n patients randomised to each arm. Assumptions for log tumour size ratios and new-lesion progression remain the same as in Section 2.4. We introduce an arm indicator R to the models, with 0 for control and 1 for experimental arms. The log tumour size ratios are modelled by
[TABLE]
the new-lesion progression for is modelled by using logistic models
[TABLE]
The probabilities of new-lesion progression for control and experimental arms are and , respectively. Let be the vector of parameters from the above models. The mean response probability at a fixed time is estimated by
[TABLE]
where is the maximum likelihood estimator of . We note that patients from both arms are included in the calculation of the probability of response in an arm, as is recommended and justified in [12]. The mean difference in response probability at a fixed time is defined as the difference between mean response probabilities for the two arms. It can be written by
[TABLE]
We obtain the variance of by using the delta method and use the Wald test to test whether is zero. Similarly, we define the mean difference in response probability for BOR as
[TABLE]
Both the extended and modified methods can be used as in previous sections.
3 Results
In this section, we evaluate the performance of the proposed methods in terms of precision and power using simulations and a real data example. We use “Bin” to represent the method that just analyses the response outcomes as binary. For single-arm trials, the binary method uses the R-package Hmisc to construct a Wilson interval for binary success or . For two-arm studies, the binary method is a logistic regression model that has parameters for treatment group and baseline tumour size, from which the treatment effect can be tested. The terms “Augbin”, “eAugbin”, and “mAug” refer to methods that use continuous information. They are, respectively, Wason and Seaman’s method [12] at two-follow up times, the extended method for more than two-follow up times, and the modified method for rapid computation. We use fixed time with varying numbers of follow-up times and best observed response without confirmation as the endpoints.
3.1 Simulation study setup
Following the aforementioned notation, the observed data available for each patient is . The observed data are simulated as follows. First of all, baseline tumour size for patient is generated from a uniform distribution and log tumour size ratios of T follow-up time are generated from a multivariate normal distribution. Tumour size can then be calculated from . Next, new-lesion progression indicators are generated from logistic models with intercept and tumour size effect . A non-zero means that probability of new-lesion progression depends on the tumour size at the previous timepoint. We define time to new-lesion progression as the first time when the new-lesion progression occurs from the logistic models. Finally, tumour size observations of patient after progression are replaced as missing.
3.1.1 Single-arm trials assessing response at fixed time
Before generating 5000 replicates, we test the computation time for running one replicate using Augbin/eAugbin. We generated one replicate of 75 patients. Baseline tumour size is generated from a uniform distribution and log tumour size ratios are generated from a multivariate normal distribution for 2 to 6 follow-up times. The are set to (-1.5,0) and (-2.5,0.2). The value of corresponds to an 18 % chance of developing new lesions between each visit. The computation time for running one replicate using Augbin/eAugbin for 2 to 6 follow-up times are 0.04, 0.65, 2.28, 3.41 and 4.47 minutes; while mAug at 6 follow-up times takes 0.09 minutes. We do not consider because of the length of time need to simulate 5000 replicates for eAugbin. The simulation settings of log tumour size ratios for 2 follow-up times is a similar formulation to [12], that is
[TABLE]
The settings for T=3 and 4 are:
[TABLE]
Table 1 shows mean estimated response probability and coverage for Bin, Augbin/eAugbin and mAug for 2,3,4 follow-up times for 5000 replicates. The columns 10-11 show the reduction in 95% confidence interval (CI). They are, respectively, the average of [1-(CI width of Augbin)/(CI width of Bin)] and [1-(CI width of mAug)/(CI width of Bin)]. As seen, in all cases, eAugbin and mAug have narrower CIs compared with Bin. For example, mAug reduces CI width by 14% means that Bin needs an additional 30% sample size to obtain a similar width. The mAug has a similar coverage to Augbin at . For larger t, it appears the mAug method has a better coverage probability (i.e. closer to the nominal value) than eAugbin. The reduction in confidence interval width, compared to the binary method, appears to be similar for the two methods. Thus for single-arm trials it appears mAug shows a significant improvement in computational efficiency without notably poorer statistical characteristics compared to eAugbin.
3.1.2 Randomised trials using response at fixed time
We consider a two-arm trial with a control and experimental arm for 2 follow-up times. Each arm has 75 patients that have been allocated at random. Baseline is generated from a distribution. The mean log tumour size ratios between each visit are generated from a normal distribution with mean and variance . We set , where for control and for experimental arms, is the difference in the mean log tumour size ratio and reflects the effectiveness of the control treatment. This is a similar formulation as [12].
Figure 1 compares the powers for Bin, eAugbin and mAug methods for randomised trials. The figure on the right shows the power over treatment effect when =.35. As seen, there is a clear power gain when using either mAug or Augbin. mAug performs very closely to Augbin. The empirical Type I error when the difference is 0 for Augbin and mAug are 0.054 and 0.055, respectively.
3.1.3 Non-comparative trials for BOR
Using the binary composite outcome, patients are classified as responders if they have a CR/PR before time F. The computation time for running one replicate using Augbin/eAugbin and BOR for 3 to 6 follow-up times are 0.05, 0.09, 0.3 and 0.56 minutes; while mAug at 6 follow-up times takes 0.22 minutes. Again we use 5000 replicates of 75 patients. Baseline tumour size is generated from a uniform distribution (0, 1). The log tumoursize ratios are generated from multivariate normal distribution for 4,5,6,7 follow-up times with . Regardless of the number of visits after baseline, we set the mean log tumour size ratios at the end of the treatment to . For example, the case where T=4 refers to having 4 visits after baseline and being set to []. For computational reasons, eAugbin was included for up to T=6. Tables 2 show the operating characteristics of mAug and Bin for maximum number of visits varying from 4 to 7. Overall, mAug reduces the average width of the CI by at least 16 % compared with Bin. This is equivalent to needing a sample size of around 101 , to obtain a similar average width using Bin. The reduction in width is slightly higher when there is a tumour size effect on new-lesion progression.
3.1.4 Comparative trials for BOR
To illustrate results of the mAug method for a two-arm trial, we consider the case where each arm has 75 patients and patients are followed for 4 time points. The mean log tumour size ratios for each time point is , , , and , where for control and for experimental arms respectively. Figure 2 compares the powers for Bin and mAug methods in comparative trials for four time points when best observed response is used. Although there is a slight inflation in Type I error rate for mAug, in general, there is a consistent power advantage when using mAug compared to using Bin. The empirical Type I error when the difference is 0 for Binary and mAug are 0.041 and 0.058, respectively.
3.2 Case study: HORIZON II
HORIZON II (clinicaltrials.gov identifier: NCT00384176) is a three arm colon cancer trial sponsored by AstraZeneca. Patients initially were randomly assigned 1:1:1 to placebo, cediranib 20 mg once daily, cediranib 30 mg once daily. Later, subsequent patients were randomly assigned 1:2 to placebo or cediranib 20 mg [4]. The numbers of patients with baseline record for the three arms are 346, 484, 209, respectively. The tumour sizes of patients were measured every 6 weeks up to 24 weeks and then every 12 weeks. Figure S1 in supporting information shows a waterfall plot for the individual reduction in tumour size at week 24 from the baseline. There are cases that participants are classified as responders before progression which results in different response estimates between fixed time and BOR.
We used a permutation test to calculate the empirical type I error rate. Data from baseline, 6, 12, 18, and 24 weeks were used. We simulated 5000 replicates, with the treatment assignment label shuffled randomly in each replicate, For each replicate, we tested the difference in probability of best observed response between two treatment arms using mAug with 4-follow up times. The empirical Type I error for no difference between placebo and cediranib 20 mg is 0.0558 and that between placebo and cediranib 30 mg is 0.0518. These are within Monte Carlo standard error of a true type I error of 0.05 (MC error +/- 0.006).
Figure S2 in supporting information shows the mean estimated response probability using the three methods and fixed time with between 2 and 5-follow up times for Placebo, 20 mg and 30 mg, respectively. The mean estimated response probability decreases as the number of timepoints increases. Generally, the estimated mean probabilities of response for three methods are similar.
Table 3 reports the width of the 95% CI for each arm’s probability of response using fixed time. The width corresponds to the length of the vertical lines shown in Figure S2. The 95% CI widths of eAugbin and mAug are considerably narrower than that of Bin. We compared Placebo and cediranib 20 mg as well as Placebo and cediranib 30 mg using mAug BOR and Bin BOR for 4-6 time points. Results show that the mAug method gives a considerably smaller 95% CI than the Bin method. The maximum width of the 95% CI for mAug is 0.131 for comparing Placebo with 30 mg, while the width is 0.174 for Bin (See Table S2 in supporting information).
4 Discussion
In this paper we have considered how the augmented binary method of [12] can be extended to be applicable for a wider range of phase II oncology trials. We have made three contributions. The first is to extend the existing method to more than two follow-up times. The second is a modified method that considerably reduces the computational time by making a simplifying assumption about the relationship between new lesions and tumour size change. The third is a mechanism for using both of these methods when the endpoint is based around the best observed RECIST observation before progression, which is a common phase II oncology endpoint.
We have shown that all proposed methods carry the same good properties as the augmented binary method. They provide extra precision, i.e. they require a smaller sample size for the same precision (compared to the traditional analysis of analysing response as a binary outcome) in single arm trials and are more powerful in comparative trials.
The difference between the modified(mAug) and extended method (eAugbin) is that the former uses the estimated probability of new-lesion progression whereas the latter more correctly incorporates variation by averaging all possibilities. Estimation of probabilities using the modified method might be biased if only a few patients remain in a trial at some timepoint. The mAug has similar properties to eAugbin with respect to precision and power when using BOR.
The extended and modified methods define progression as 20% increase from baseline, whereas RECIST defines progression as 20% increase from the minimum point observed. On the HORIZON II dataset, we examined the number of patients who had their best observed response being PR or CR by both of these definitions. The number is the same for both approaches for all number of follow-up times. This indicates that considering progression as being 20% from baseline does not substantially affect the estimation. However, we should point out that the eAugbin would be able to use the RECIST definition of progression by including a suitable indicator variable in the integrand as well as mAug by changing regions of integration of variables.
All proposed augmented binary methods involve modelling the log tumour size ratio and new-lesion progression indicators. An alternative approach is joint latent modelling of longitudinal tumour size data and the new-lesion progression. One can use a random effect model for the repeat tumour size measure, and a latent class membership for new-lesion progression. By membership, we mean a participant has probabilities of belonging to latent classes. Each class refers to the time when new-lesion progression occurs. Moreover, tumour-growth progression or new lesions appearing at a time period results in the patient’s tumour size measure being missing for all subsequent time periods. Considering this monotone missing pattern in log tumour size, the joint probability of log tumour size can be written as the product of a set of conditional probabilities of current log tumour size ratio given previous data [10]. Future work is warranted to investigate whether this more complicated methodology is worth applying.
We have only considered response end-points in this work. An increasingly commonly used phase II endpoint is progression-free-survival (PFS). Further development of the augmented binary method so that it can be applied to improve analyses of PFS is an area of current work.
5 Software
A package mAugbin in R is available at https://sites.google.com/site/jmswason/supplementary-material for the methods proposed in this work. The package includes extend augment binary method, which is more computationally intensive, as well as modified augmented binary method, which required an assumption for estimating the probability of response for fixed time and for best observed response.
Acknowledgement
This work was supported by the Medical Research Council (grant number MC_UP_1302/4), Cancer Research UK (grant number C48553/A18113). We thank AstraZeneca for providing HORIZON II data.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. N. Dhani, D. Tu, D. J. Sargent, L. Seymour, and M. J. Moore. Alternate endpoints for screening phase II studies. Clinical Cancer Research , 15:1873–1882, 2009.
- 22. E. Eisenhauer, P. Therasse, J. Bogaerts, L. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, L. Rubinstein, L. Shankar, L. Dodd, R. Kaplan, D. Lacombe, and J. Verweij. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). European journal of cancer , 45:228–247, 2009.
- 33. A. Genz and F. Bretz. Computation of Multivariate Normal and t Probabilities . Springer-Verlag, Heidelberg, 2009.
- 44. P. Hoff, A. Hochhaus, B. Pestalozzi, N. Tebbutt, J. Li, T. Kim, K. Koynov, G. Kurteva, T. Pint e ´ ´ 𝑒 \acute{e} r, Y. Cheng, B. van Eyll, L. Pike, A. Fielding, J. Robertson, and M. Saunders. Cediranib plus folfox/capox versus placebo plus folfox/capox in patients with previously untreated metastatic colorectal cancer: a randomized, double-blind, phase iii study (horizon ii). Journal of Clinical Oncology , 29:3596–603, 2012.
- 55. T. Jaki, V. Andre, T. L. Su, and J. Whitehead. Designing exploratory cancer trials using change in tumour size as primary endpoint. Statistics in Medicine , 32:2544–2554, 2013.
- 66. J. R. Johnson, G. Williams, and R. Pazdur. End points and United States food and drug administration approval of oncology drugs. Journal of Clinical Oncology , 21:1404–1411, 2003.
- 77. T. G. Karrison, M. L. Maitland, W. M. Stadler, and M. J. Ratain. Design of phase II cancer trials using a continuous endpoint of change in tumor size: application to a study of sorafenib and erlotinib in non small-cell lung cancer. Journal of National Cancer Institute , 99:1455–1461, 2007.
- 88. S. M. Paul, D. S. Mytelka, C. T. Dunwiddie, C. C. Persinger, B. H. Munos, S. R. Lindborg, and A. L. Schacht. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nature reviews Drug discovery , 9:203–214, 2010.
