# Handling an uncertain control group event risk in non-inferiority   trials: non-inferiority frontiers and the power-stabilising transformation

**Authors:** Matteo Quartagno, A. Sarah Walker, Abdel G. Babiker, Rebecca M., Turner, Mahesh K.B. Parmar, Andrew Copas, Ian R. White

arXiv: 1905.00241 · 2019-05-02

## TL;DR

This paper introduces a novel approach for designing non-inferiority trials that accounts for uncertain control event risks by using non-inferiority frontiers and the power-stabilising transformation, enhancing trial robustness.

## Contribution

It proposes a new method employing non-inferiority frontiers and the power-stabilising transformation to improve trial design under uncertain control event risks.

## Key findings

- Working on the risk ratio scale maintains type I error control.
- Using the risk difference scale can inflate type I error but requires smaller sample sizes.
- The arcsine scale results are difficult to interpret clinically.

## Abstract

Background. Non-inferiority (NI) trials are increasingly used to evaluate new treatments expected to have secondary advantages over standard of care, but similar efficacy on the primary outcome. When designing a NI trial with a binary primary outcome, the choice of effect measure for the NI margin has an important effect on sample size calculations; furthermore, if the control event risk observed is markedly different from that assumed, the trial can quickly lose power or the results become difficult to interpret. Methods. We propose a new way of designing NI trials to overcome the issues raised by unexpected control event risks by specifying a NI frontier, i.e. a curve defining the most appropriate non-inferiority margin for each possible value of control event risk. We propose a fixed arcsine difference frontier, the power-stabilising transformation for binary outcomes. We propose and compare three ways of designing a trial using this frontier. Results. Testing and reporting on the arcsine scale leads to results which are challenging to interpret clinically. Working on the arcsine scale generally requires a larger sample size compared to the risk difference scale. Therefore, working on the risk difference scale, modifying the margin after observing the control event risk, might be preferable, as it requires a smaller sample size. However, this approach tends to slightly inflate type I error rate; a solution is to use a lower significance level for testing. When working on the risk ratio scale, the same approach leads to power levels above the nominal one, maintaining type I error under control. Conclusions. Our proposed methods of designing NI trials using power-stabilising frontiers make trial design more resilient to unexpected values of the control event risk, at the only cost of requiring larger sample sizes when the goal is to report results on the risk difference scale.

---
Source: https://tomesphere.com/paper/1905.00241