The Effect of Staging Intervals on Progression-Free Survival in Registration Studies of Oncologic Drugs: A Meta-Analysis
Jonas A. Zuellig, Roman Adam, Filomena Udry, Ariadna Tibau, Bostjan Šeruga, Alberto Ocaña, Eitan Amir, Arnoud J. Templeton

TL;DR
This study found that how often patients are scanned during cancer drug trials can affect how effective the drug appears, with longer intervals suggesting better results.
Contribution
The study reveals that staging intervals in oncology trials influence progression-free survival hazard ratios, with implications for drug approval assessments.
Findings
Shorter restaging intervals (<8 weeks) were associated with higher hazard ratios (HRs), suggesting lower treatment effectiveness.
Longer restaging intervals (≥8 weeks) showed lower HRs, indicating a stronger apparent treatment effect.
Results varied by cancer type, with melanoma and kidney cancer showing opposite trends in HRs based on staging intervals.
Abstract
This analysis investigated whether the frequency of radiographic assessments affects the perceived effectiveness of clinical trials for cancer drugs. Progression-free survival (PFS), a term used to define the time a patient lives without disease progression, was used for the analysis. This study analyzed pivotal studies supporting drug approvals in Switzerland from 2010 to 2022. Their findings showed that shorter intervals between scans (less than 8 weeks) were linked to higher hazard ratios (HRs), meaning a lower apparent treatment effect, while longer intervals (8 weeks or more) showed a stronger effect. This puts the concern that frequent scans might exaggerate drug benefits into a new perspective. However, results varied by cancer type, drug type, and also the primary outcome of the studies. This study had limitations, such as relying on published rather than individual patient…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Economic and Financial Impacts of Cancer · Statistical Methods in Clinical Trials
1. Introduction
In recent years, a substantial number of new drugs for the treatment of metastatic cancer have been developed and approved based on surrogate endpoints [1]. Increasingly, progression-free survival (PFS) has been adopted as a primary outcome measure, which is usually assessed by regular radiographic imaging based on standardized criteria like RECIST [2,3,4,5]. In clinical studies PFS is usually defined as the time from randomization to the radiographic documentation of progression or death. The date of the radiologic evaluation at which progression is first evident is thereby used as a proxy for the true progression time, since the true time of progression typically lies between the two assessments [6]. This leads to an overestimation of the true PFS, and an apparently longer median PFS may just be a consequence of the length of the surveillance interval [7,8,9]. In 2021, Dabush and colleagues reported that in clinical studies of metastatic breast cancer shorter staging intervals (<9 weeks) are associated with lower hazard ratios (HRs) compared to studies applying longer intervals for restaging and thus suggesting a greater treatment effect in terms of PFS [10].
The aim of our work was to explore the potential impact of restaging intervals in studies leading to the registration of oncologic drugs in Switzerland to corroborate or to put into perspective the hypothesis that shorter staging intervals are associated with apparent higher magnitude of the PFS effect across various disease sites.
2. Materials and Methods
2.1. Data Sources and Searches
This analysis was conducted in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [11]. New drugs and new indications for earlier approved drugs for market authorization in Switzerland between 2010 and 2022 were identified from the official Journal of Swissmedic, the national authorization and supervisory authority for drugs and medical products, and its website swissmedicinfo.ch [12]. Subsequently, studies supporting the authorization of the respective drug and indication were searched based on the information available with the official label [13].
2.2. Study Selection
The following selection criteria were used: (i) pivotal study supporting drug registration with PFS as primary or secondary endpoint, (ii) availability of HR for PFS with corresponding 95% confidence intervals (CIs) and/or p-value, and (iii) staging interval of radiographic assessment reported. Studies in the curative setting (i.e., neo-/adjuvant therapies) and the pediatric setting and of non-oncologic drugs (e.g., for supportive treatment) or hematologic indications (e.g., myeloma, lymphoma, and leukemia) were not included.
2.3. Data Extraction
Data were collected using predesigned abstraction forms. The following data were extracted from original publications: name of first author, year of publication, study phase and design, disease site, drug class, indication (e.g., line of therapy), staging interval, primary endpoint(s), HR for PFS, and associated 95% CI and/or p-value. HRs were preferably extracted from multivariable models where available. For studies with time-varying staging intervals (e.g., longer intervals with longer follow-up), the initial staging intervals were chosen. In the case of studies which included patients in various lines of treatment (e.g., first-line and second-line) these were classified as “other line”.
When the indication for authorization corresponded to a specific subgroup evaluated within a study, data for that subgroup were used if reported separately [14]. If the indication was based on multiple studies with differing selection criteria—such as mutation status, prior therapies, or treatment lines—all relevant studies were included in the evaluation. When the indication was broadened to include additional subgroups—such as a new mutation class, treatment line, or an extended authorization across different treatment stages—the corresponding study data supporting the expansion were reviewed. If multiple studies with identical selection criteria supported the authorization, the study with the largest sample size was chosen. In trials reporting multiple subgroups, the subgroup most closely aligned with the approved indication was selected.
2.4. Data Synthesis
HRs were pooled in a meta-analysis. The prespecified primary analysis was dichotomized according to the median staging interval (<median vs. ≥median). Subsequently, subgroup analyses according to drug classes and disease sites were carried out if a subgroup consisted of at least 5 studies. Exploratory subgroup analyses were performed according to the trial phase and according to the primary outcome in the respective studies. Subsequently, sensitivity analyses using staging interval cut offs other than the median and only including phase 3 studies, were performed. To further address heterogeneity and to test the robustness of the findings the analyses were repeated after the exclusion of outliers (i.e., studies with staging intervals < 6 weeks or >12 weeks) [15].
2.5. Statistical Analyses
Data were combined into a meta-analysis using RevMan 5.4 analysis software. [16] Estimates of HR were weighted and pooled using the generic inverse-variance and random-effect model [17]. Differences between the subgroups were assessed using methods described by Deeks et al. [18]. Heterogeneity was assessed using Cochran Q and I^2^ statistics [19]. To explore the potential impact of staging intervals as a continuous variable, a meta-regression was performed with the staging interval as the independent variable and the natural logarithm of the HR (Ln (HR)) as the dependent variable utilizing SPSS version 25.0 (IBM Corp. Armonk, NY, USA) [20]. Weighting was performed with the inverse of the variance of the HR. Publication bias was not formally assessed since all studies included in the analysis supported drug approvals, and thus sufficient quality was assumed. All statistical tests were two-sided, and statistical significance was defined as p < 0.05. No correction was made for multiple significance testing.
3. Results
3.1. Included Studies
Between 2010 and 2022 73 drugs for 167 indications received market authorization by Swissmedic (Figure 1).
In total, 112 studies met the selection criteria and were included in the analysis. The study characteristics are given in Table 1. Most studies were randomized phase 3 studies (93%); the most common disease sites were lung cancer, breast cancer, and gastrointestinal malignancies. Immunotherapies and targeted therapies (small molecules) were the largest groups of drug classes. PFS was the most commonly used primary or co-primary endpoint.
3.2. Staging Intervals
The median staging interval was 8 weeks (range 4–18). Fifty-one studies (46%) had staging intervals < 8 weeks and sixty-one (54%) had staging intervals ≥ 8 weeks (Table 1). The pooled HR for staging intervals less than the median was 0.58 (95% CI 0.53–0.64), while the pooled HR for staging intervals equal or longer than the median was 0.48 (95% CI 0.44–0.52) with a p-value of 0.005 for the subgroup difference. There was significant statistical heterogeneity (I^2^ = 90%, p < 0.001) which could not be explained by outlier studies (Figure 2).
3.3. Subgroup Analyses
The differences between the subgroups according to drug classes and disease sites are shown in Table 2. No significant difference between the pooled HRs for PFS among the different drug classes was observed for restaging intervals < 8 weeks compared to restaging intervals ≥ 8 weeks. In studies of melanoma, shorter staging intervals were associated with a lower pooled HR (the HR for staging intervals < 8 weeks was 0.46 vs. 0.63 for staging intervals ≥ 8 weeks, p = 0.02), whereas in studies of renal cell cancer the opposite was observed, i.e., longer staging intervals were associated with a lower pooled HR (the HR for staging intervals < 8 weeks was 0.67 vs. 0.44 for staging intervals ≥ 8 weeks, p = 0.01). In all other tested subgroups according to disease sites there was no significant difference. In the subgroups according to the trial phase, the overall finding remained unchanged in phase 3 studies and did not reach a statistical significance in the fairly small group of phase 2 studies. Interestingly, when grouping the studies according to their primary outcome, the finding of lower HRs with longer staging intervals was only observed in studies with OS as the primary outcome (0.72 vs. 0.58, p = 0.03).
3.4. Sensitivity Analyses
Sensitivity analyses were performed to control the cut-off effects of the staging intervals on the outcomes. With a cut-off of 9 weeks (i.e., <9 weeks vs. ≥9 weeks), as in the work by Dabush et al., we found a similar effect as with the median of 8 weeks, namely a numerically higher pooled HR for staging intervals < 9 weeks compared to staging intervals ≥ 9 weeks (HR 0.54 vs. 0.45, p for subgroup difference 0.06). Similar results were found for cut-offs at 6 and 12 weeks and the cut-off at 12 weeks (Table 3). When studies with outliers (i.e., staging interval < 6 weeks or >12 weeks) were excluded, the main result remained unchanged (HR for staging interval < 8 weeks vs. ≥8 weeks 0.60 vs. 0.49, p for subgroup difference = 0.002). Further sensitivity analyses after the exclusion of phase 2 studies yielded similar results (Supplementary Table S1). Yet, significant heterogeneity remained and could not be explained by the removal of single studies.
3.5. Meta-Regression
Evaluating the potential effect of the staging interval as a continuous variable on the HR for PFS, a meaningful correlation of lower HRs with longer staging intervals was observed based on the Burnand criteria [22] (beta −0.422; p < 0.001). In a sensitivity analysis including phase 3 trials only, similar results were found (beta −0.427, p < 0.001).
4. Discussion
Earlier reports suggest that, when assessing PFS in studies of metastatic breast cancer, shorter restaging intervals are associated with lower HRs and might thus bias the conclusion of the apparent benefit of experimental drugs. This prompted us to explore this issue in registration studies of oncologic drugs across various disease sites during a 13-year period. In this analysis, overall shorter staging intervals were associated with higher HRs (i.e., lower effect size). This finding is reassuring since it puts any claim into perspective that by selecting shorter staging intervals the PFS results of a clinical study might be biased in favor of the experimental arm. Yet, how might the different findings be explained? First, Dabush et al. pooled 98 studies in metastatic breast cancer, while in our study there was considerable clinical heterogeneity, explaining the overall high statistical heterogeneity [18]. Also, the tumor type and underlying heterogenous biology may explain the divergence of findings in the indication subgroup analyses. As tumors with a more aggressive biology exhibit a shorter time to progression [23], they may be assessed more frequently with shorter staging intervals [10,24,25,26]. Since changes in PFS during brief intervals could have the tendency to be more modest, it might influence the hazard ratios observed in studies using shorter staging intervals [10]. Second, the difference in the pooled HRs reported by Dabush et al. was quite small and likely not clinically meaningful (HR 0.79 vs. 0.86, p for subgroup difference 0.15); however, significant findings were observed in non-first-line trials, trials with drugs replacing standard treatments, and studies performed exclusively in human epidermal growth factor receptor 2 (HER2)-positive disease. Our study only included 18 studies supporting drug authorization in breast cancer, which did not allow us to comprehensively assess the respective subgroups with adequate power. Notably, in line with earlier findings, we found numerically lower HRs with shorter staging intervals (0.50 vs. 0.58, p for subgroup difference = 0.28) in breast cancer trials, but these differences are likely non-meaningful.
Our work has several limitations which should be considered, and open questions remain. First, the analysis was based on published HRs rather than the individual patient data needed for more in-depth analyses [27,28]. Second, we considered pivotal studies highlighted in the initial drug label with the authorization of a drug in a specific indication. Earlier and other studies also supporting drug authorization may thus have been missed, decreasing our ability and the power of our analysis to detect the potential effects of staging intervals on the magnitude of the PFS benefit—especially in smaller subgroups of drug classes and disease sites [29]. Third, there was considerable heterogeneity which could not be adequately explained despite multiple subgroup and sensitivity analyses. This raises the question of whether some findings might be due to chance, e.g., in the subgroup of studies for melanoma where shorter staging intervals were associated with apparent greater effects on PFS without an obvious biological explanation. Fourth, the main finding reported here appears to be driven by studies with OS as the primary outcome, where staging intervals only influence secondary outcomes. The reason for this remains unclear and should be further explored, although the practical relevance seems limited since OS is clearly the more important outcome for patients in comparison to the PFS measure, and the influence of staging intervals on OS appears unlikely. Fifth, we did not have the adequate power to perform multivariable meta-regression analyses to control for factors potentially influencing the observed correlation of longer staging intervals with lower HRs. Sixth, the basis of our analysis was drug authorizations in Switzerland, which have their own regulatory body. Thus, the results might be slightly different when considering authorizations by the United States’ Food and Drug Administration (FDA) or the European Medicines Agency (EMA). Although Swissmedic tends to grant the market access of drugs in the (neo-)adjuvant setting more restrictively than its counterparts, this is less so in the palliative setting assessed here [30,31]. Thus, the results provided here are likely generalizable to other jurisdictions [32]. Seventh, we did not include studies supporting approvals of drugs used in pediatric oncology and hematologic malignancies, leaving the potential impact of staging intervals in such studies unexplored. Last but not least, the analyses were all univariable which makes it difficult to know how many of the findings are independent [33,34].
5. Conclusions
In conclusion, in the studies leading to the authorization of oncologic drugs in the palliative setting, longer rather than shorter restaging intervals to measure PFS were associated with an apparent higher magnitude of effect. This observation was the opposite of what was anticipated. Thus, the potential impact of staging intervals on PFS outcomes in randomized studies, considering disease biology, warrants further research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Scott E.C. Baines A.C. Gong Y. Moore R. Pamuk G.E. Saber H. Subedee A. Thompson M.D. Xiao W. Pazdur R. Trends in the Approval of Cancer Therapies by the FDA in the Twenty-First Century Nat. Rev. Drug Discov.20232262564010.1038/s 41573-023-00723-437344568 · doi ↗ · pubmed ↗
- 2Del Paggio J.C. Berry J.S. Hopman W.M. Eisenhauer E.A. Prasad V. Gyawali B. Booth C.M. Evolution of the Randomized Clinical Trial in the Era of Precision Oncology JAMA Oncol.2021772873410.1001/jamaoncol.2021.037933764385 PMC 7995135 · doi ↗ · pubmed ↗
- 3Pazdur R. Endpoints for Assessing Drug Activity in Clinical Trials Oncol.200813192110.1634/theoncologist.13-S 2-1918434634 · doi ↗ · pubmed ↗
- 4Kim C. Prasad V. Strength of Validation for Surrogate End Points Used in the US Food and Drug Administration’s Approval of Oncology Drugs Mayo Clin. Proc.20169171372510.1016/j.mayocp.2016.02.012PMC 510466527236424 · doi ↗ · pubmed ↗
- 5Eisenhauer E.A. Therasse P. Bogaerts J. Schwartz L.H. Sargent D. Ford R. Dancey J. Arbuck S. Gwyther S. Mooney M. New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1)Eur. J. Cancer 20094522824710.1016/j.ejca.2008.10.02619097774 · doi ↗ · pubmed ↗
- 6Delgado A. Guddati A.K. Clinical Endpoints in Oncology—A Primer Am. J. Cancer Res.2021111121113133948349 PMC 8085844 · pubmed ↗
- 7Panageas K.S. Ben-Porat L. Dickler M.N. Chapman P.B. Schrag D. When You Look Matters: The Effect of Assessment Schedule on Progression-Free Survival JNCI J. Natl. Cancer Inst.20079942843210.1093/jnci/djk 09117374832 · doi ↗ · pubmed ↗
- 8Chen E.Y. Joshi S.K. Tran A. Prasad V. Estimation of Study Time Reduction Using Surrogate End Points Rather Than Overall Survival in Oncology Clinical Trials JAMA Intern. Med.201917964264710.1001/jamainternmed.2018.835130933235 PMC 6503556 · doi ↗ · pubmed ↗
