Energy distance and kernel mean embedding for two sample survival test
Marcos Matabuena

TL;DR
This paper introduces new statistical tests for comparing two survival distributions under right censoring, utilizing energy distance and kernel mean embedding, with permutation calibration and proven consistency.
Contribution
It proposes a novel family of two-sample tests specifically designed for censored survival data, combining energy distance and kernel methods with permutation calibration.
Findings
Tests perform well in finite sample simulations
They are consistent against all alternatives
Effective in real survival analysis scenarios
Abstract
In this article a new family of tests is proposed for the comparison problem of the equality of distribution of two-sample under right censoring scheme. The tests are based on energy distance and kernels mean embedding, are calibrated by permutations and are consistent against all alternatives. The good performance of the new tests in real situations with finite samples is established with a simulation study.
| Group | Group | Total | |
|---|---|---|---|
| Number of live subjects | |||
| Number of subjects that die |
| Kernel Function | |
|---|---|
| Gaussian | |
| Laplacian | |
| Rational quadratic | , |
| Mattern |
| Method: | Energy distance | Energy distance | Energy distance | Energy distance | Energy distance | Kernel | Kernel | Kernel | Kernel | Logrank | Gehan | Tarone | Peto | Flemming | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gaussian | Quadratic | Quadratic | |||||||||||||||
| Comparative | Censoring rate | ||||||||||||||||
| Exp(1) | 20 | 20 | 0.1 | 0.496 0.286 | 0.493 0.283 | 0.495 0.285 | 0.497 0.287 | 0.499 0.289 | 0.495 0.287 | 0.493 0.284 | 0.493 0.288 | 0.497 0.286 | 0.508 0.291 | 0.505 0.288 | 0.503 0.285 | 0.502 0.285 | 0.495 0.296 |
| Exp(1) | 50 | 50 | 0.1 | 0.482 0.293 | 0.478 0.290 | 0.481 0.293 | 0.483 0.292 | 0.485 0.289 | 0.481 0.298 | 0.478 0.294 | 0.479 0.296 | 0.482 0.297 | 0.492 0.293 | 0.489 0.295 | 0.478 0.280 | 0.486 0.292 | 0.490 0.295 |
| Exp(1.5) | 20 | 20 | 0.1 | 0.482 0.287 | 0.490 0.285 | 0.484 0.286 | 0.480 0.287 | 0.477 0.288 | 0.489 0.290 | 0.49 0.285 | 0.493 0.288 | 0.483 0.289 | 0.471 0.296 | 0.486 0.289 | 0.475 0.291 | 0.481 0.288 | 0.458 0.288 |
| Exp(1.5) | 50 | 50 | 0.1 | 0.482 0.295 | 0.475 0.288 | 0.481 0.293 | 0.483 0.296 | 0.484 0.297 | 0.485 0.293 | 0.481 0.289 | 0.487 0.289 | 0.483 0.295 | 0.492 0.299 | 0.501 0.295 | 0.492 0.295 | 0.498 0.295 | 0.479 0.293 |
| Exp(1) | 20 | 20 | 0.3 | 0.508 0.288 | 0.509 0.288 | 0.508 0.288 | 0.507 0.288 | 0.508 0.290 | 0.503 0.285 | 0.506 0.287 | 0.502 0.286 | 0.504 0.287 | 0.495 0.284 | 0.502 0.297 | 0.500 0.294 | 0.499 0.295 | 0.507 0.286 |
| Exp(1) | 50 | 50 | 0.3 | 0.494 0.297 | 0.496 0.295 | 0.494 0.297 | 0.494 0.295 | 0.494 0.291 | 0.493 0.297 | 0.495 0.297 | 0.494 0.298 | 0.493 0.297 | 0.503 0.296 | 0.486 0.297 | 0.491 0.296 | 0.486 0.297 | 0.502 0.287 |
| Exp(1.5) | 20 | 20 | 0.3 | 0.500 0.290 | 0.510 0.293 | 0.503 0.291 | 0.498 0.289 | 0.495 0.288 | 0.492 0.284 | 0.506 0.292 | 0.498 0.288 | 0.492 0.284 | 0.497 0.295 | 0.499 0.289 | 0.493 0.285 | 0.495 0.286 | 0.501 0.292 |
| Exp(1.5) | 50 | 50 | 0.3 | 0.489 0.301 | 0.487 0.297 | 0.488 0.300 | 0.489 0.301 | 0.490 0.299 | 0.489 0.301 | 0.486 0.299 | 0.486 0.301 | 0.490 0.302 | 0.496 0.298 | 0.492 0.294 | 0.495 0.299 | 0.492 0.294 | 0.500 0.299 |
| Gamma(1,1) | 20 | 20 | 0.1 | 0.501 0.294 | 0.508 0.297 | 0.503 0.295 | 0.499 0.293 | 0.493 0.288 | 0.512 0.297 | 0.508 0.296 | 0.511 0.296 | 0.505 0.295 | 0.491 0.284 | 0.510 0.294 | 0.498 0.288 | 0.506 0.292 | 0.493 0.282 |
| Gamma(1,1) | 50 | 50 | 0.1 | 0.503 0.291 | 0.504 0.288 | 0.504 0.289 | 0.503 0.293 | 0.502 0.297 | 0.512 0.292 | 0.508 0.288 | 0.511 0.290 | 0.511 0.293 | 0.505 0.292 | 0.508 0.287 | 0.505 0.290 | 0.508 0.288 | 0.502 0.290 |
| Gamma(1.5,1.5) | 20 | 20 | 0.1 | 0.519 0.295 | 0.516 0.289 | 0.519 0.294 | 0.520 0.296 | 0.522 0.295 | 0.515 0.301 | 0.516 0.295 | 0.515 0.299 | 0.516 0.299 | 0.52 0.290 | 0.519 0.289 | 0.522 0.291 | 0.516 0.287 | 0.509 0.287 |
| Gamma(1.5,1.5) | 50 | 50 | 0.1 | 0.499 0.290 | 0.493 0.291 | 0.497 0.290 | 0.501 0.289 | 0.506 0.287 | 0.494 0.289 | 0.495 0.292 | 0.493 0.291 | 0.498 0.289 | 0.515 0.295 | 0.505 0.289 | 0.509 0.289 | 0.506 0.288 | 0.505 0.291 |
| Gamma(1,1) | 20 | 20 | 0.3 | 0.477 0.288 | 0.485 0.288 | 0.479 0.288 | 0.475 0.289 | 0.474 0.289 | 0.479 0.289 | 0.484 0.287 | 0.484 0.288 | 0.475 0.289 | 0.477 0.297 | 0.467 0.288 | 0.464 0.288 | 0.463 0.287 | 0.489 0.292 |
| Gamma(1,1) | 50 | 50 | 0.3 | 0.489 0.293 | 0.497 0.296 | 0.491 0.294 | 0.486 0.290 | 0.482 0.287 | 0.495 0.289 | 0.497 0.293 | 0.495 0.288 | 0.492 0.291 | 0.485 0.292 | 0.513 0.300 | 0.498 0.293 | 0.511 0.300 | 0.474 0.287 |
| Gamma(1.5,1.5) | 20 | 20 | 0.3 | 0.491 0.293 | 0.494 0.293 | 0.492 0.294 | 0.491 0.293 | 0.489 0.293 | 0.493 0.294 | 0.494 0.294 | 0.494 0.295 | 0.492 0.293 | 0.484 0.297 | 0.499 0.294 | 0.490 0.295 | 0.494 0.292 | 0.484 0.294 |
| Gamma(1.5,1.5) | 50 | 50 | 0.3 | 0.495 0.295 | 0.489 0.294 | 0.493 0.294 | 0.496 0.295 | 0.499 0.293 | 0.492 0.293 | 0.49 0.295 | 0.491 0.294 | 0.492 0.294 | 0.509 0.289 | 0.49 0.291 | 0.493 0.288 | 0.489 0.292 | 0.514 0.288 |
| Lognormal(0,0.5) | 20 | 20 | 0.1 | 0.49 0.287 | 0.495 0.288 | 0.492 0.287 | 0.489 0.288 | 0.488 0.290 | 0.49 0.283 | 0.493 0.287 | 0.489 0.284 | 0.490 0.285 | 0.472 0.279 | 0.477 0.287 | 0.470 0.285 | 0.473 0.286 | 0.483 0.287 |
| Lognormal(0,0.5) | 50 | 50 | 0.10 | 0.503 0.283 | 0.506 0.284 | 0.504 0.283 | 0.503 0.283 | 0.502 0.282 | 0.500 0.283 | 0.506 0.285 | 0.505 0.285 | 0.500 0.283 | 0.508 0.283 | 0.504 0.279 | 0.508 0.286 | 0.504 0.281 | 0.515 0.296 |
| Lognormal(0,0.25) | 20 | 20 | 0.1 | 0.481 0.294 | 0.484 0.300 | 0.482 0.295 | 0.48 0.292 | 0.480 0.292 | 0.481 0.294 | 0.482 0.296 | 0.480 0.293 | 0.481 0.295 | 0.484 0.291 | 0.476 0.295 | 0.473 0.291 | 0.471 0.293 | 0.487 0.290 |
| Lognormal(0,0.25) | 50 | 50 | 0.1 | 0.517 0.289 | 0.512 0.287 | 0.516 0.288 | 0.518 0.290 | 0.519 0.293 | 0.517 0.291 | 0.516 0.288 | 0.515 0.288 | 0.518 0.292 | 0.517 0.292 | 0.523 0.291 | 0.522 0.293 | 0.522 0.291 | 0.506 0.284 |
| Lognormal(0,0.0.5) | 20 | 20 | 0.3 | 0.495 0.288 | 0.498 0.287 | 0.496 0.288 | 0.493 0.288 | 0.489 0.287 | 0.495 0.287 | 0.497 0.287 | 0.496 0.284 | 0.493 0.289 | 0.495 0.285 | 0.489 0.288 | 0.488 0.287 | 0.485 0.286 | 0.516 0.289 |
| Lognormal(0,0.5) | 50 | 50 | 0.3 | 0.482 0.293 | 0.490 0.297 | 0.485 0.295 | 0.480 0.292 | 0.476 0.291 | 0.482 0.294 | 0.488 0.297 | 0.484 0.294 | 0.480 0.294 | 0.476 0.296 | 0.473 0.293 | 0.468 0.287 | 0.47 0.292 | 0.487 0.296 |
| Lognormal(0,0.25) | 20 | 20 | 0.3 | 0.522 0.293 | 0.513 0.289 | 0.519 0.292 | 0.523 0.295 | 0.525 0.297 | 0.526 0.298 | 0.519 0.292 | 0.526 0.298 | 0.526 0.298 | 0.516 0.293 | 0.526 0.306 | 0.524 0.303 | 0.524 0.306 | 0.518 0.286 |
| Lognormal(0,0.25) | 50 | 50 | 0.3 | 0.504 0.291 | 0.508 0.287 | 0.505 0.290 | 0.504 0.292 | 0.504 0.295 | 0.501 0.296 | 0.504 0.29 | 0.499 0.294 | 0.502 0.296 | 0.491 0.289 | 0.500 0.296 | 0.496 0.295 | 0.498 0.295 | 0.494 0.289 |
| Method: | Energy distance | Energy distance | Energy distance | Energy distance | Energy distance | Kernel | Kernel | Kernel | Kernel | Logrank | Gehan | Tarone | Peto | Flemming | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gaussian | Quadratic | Quadratic | |||||||||||||||
| Comparative | Censoring rate | ||||||||||||||||
| Exp(1) | 20 | 20 | 0.1 | 0.048 | 0.048 | 0.048 | 0.048 | 0.050 | 0.052 | 0.046 | 0.050 | 0.054 | 0.050 | 0.046 | 0.046 | 0.044 | 0.058 |
| Exp(1) | 50 | 50 | 0.1 | 0.056 | 0.060 | 0.054 | 0.058 | 0.056 | 0.052 | 0.056 | 0.056 | 0.056 | 0.060 | 0.066 | 0.064 | 0.064 | 0.056 |
| Exp(1.5) | 20 | 20 | 0.1 | 0.066 | 0.056 | 0.070 | 0.066 | 0.066 | 0.072 | 0.058 | 0.064 | 0.066 | 0.066 | 0.058 | 0.062 | 0.058 | 0.062 |
| Exp(1.5) | 50 | 50 | 0.1 | 0.042 | 0.048 | 0.042 | 0.042 | 0.046 | 0.048 | 0.044 | 0.044 | 0.042 | 0.062 | 0.056 | 0.050 | 0.056 | 0.054 |
| Exp(1) | 20 | 20 | 0.3 | 0.058 | 0.058 | 0.060 | 0.060 | 0.060 | 0.005 | 0.042 | 0.042 | 0.056 | 0.058 | 0.054 | 0.056 | 0.056 | 0.056 |
| Exp(1) | 50 | 50 | 0.3 | 0.056 | 0.056 | 0.056 | 0.056 | 0.048 | 0.054 | 0.050 | 0.056 | 0.050 | 0.050 | 0.052 | 0.054 | 0.058 | 0.046 |
| Exp(1.5) | 20 | 20 | 0.3 | 0.058 | 0.048 | 0.056 | 0.054 | 0.052 | 0.052 | 0.052 | 0.048 | 0.056 | 0.054 | 0.054 | 0.050 | 0.052 | 0.052 |
| Exp(1.5) | 50 | 50 | 0.3 | 0.064 | 0.048 | 0.062 | 0.064 | 0.074 | 0.064 | 0.044 | 0.054 | 0.056 | 0.066 | 0.068 | 0.066 | 0.068 | 0.056 |
| Gamma(1,1) | 20 | 20 | 0.3 | 0.058 | 0.056 | 0.060 | 0.056 | 0.054 | 0.052 | 0.060 | 0.060 | 0.052 | 0.054 | 0.054 | 0.056 | 0.058 | 0.054 |
| Gamma(1,1) | 50 | 50 | 0.1 | 0.044 | 0.042 | 0.044 | 0.042 | 0.044 | 0.042 | 0.040 | 0.038 | 0.042 | 0.038 | 0.038 | 0.030 | 0.032 | 0.050 |
| Gamma(1.5,1.5) | 20 | 20 | 0.1 | 0.062 | 0.058 | 0.062 | 0.066 | 0.066 | 0.060 | 0.060 | 0.056 | 0.064 | 0.046 | 0.048 | 0.048 | 0.048 | 0.062 |
| Gamma(1.5,1.5) | 50 | 50 | 0.1 | 0.050 | 0.054 | 0.052 | 0.048 | 0.048 | 0.054 | 0.052 | 0.050 | 0.052 | 0.046 | 0.044 | 0.046 | 0.044 | 0.050 |
| Gamma(1,1) | 20 | 20 | 0.3 | 0.058 | 0.058 | 0.062 | 0.058 | 0.058 | 0.064 | 0.054 | 0.060 | 0.062 | 0.056 | 0.060 | 0.058 | 0.052 | 0.066 |
| Gamma(1,1) | 50 | 50 | 0.3 | 0.058 | 0.060 | 0.060 | 0.060 | 0.056 | 0.056 | 0.062 | 0.060 | 0.054 | 0.054 | 0.052 | 0.058 | 0.050 | 0.046 |
| Gamma(1.5,1.5) | 20 | 20 | 0.3 | 0.068 | 0.056 | 0.066 | 0.068 | 0.066 | 0.070 | 0.056 | 0.060 | 0.070 | 0.058 | 0.060 | 0.064 | 0.062 | 0.066 |
| Gamma(1.5,1.5) | 50 | 50 | 0.3 | 0.056 | 0.060 | 0.058 | 0.054 | 0.052 | 0.056 | 0.068 | 0.064 | 0.066 | 0.050 | 0.062 | 0.060 | 0.062 | 0.050 |
| Lognormal(0,0.5) | 20 | 20 | 0.1 | 0.050 | 0.046 | 0.050 | 0.046 | 0.042 | 0.052 | 0.054 | 0.058 | 0.044 | 0.052 | 0.044 | 0.044 | 0.042 | 0.048 |
| Lognormal(0,0.5) | 50 | 50 | 0.1 | 0.040 | 0.040 | 0.036 | 0.040 | 0.042 | 0.038 | 0.040 | 0.040 | 0.040 | 0.040 | 0.034 | 0.040 | 0.036 | 0.040 |
| Lognormal(0,0.25) | 20 | 20 | 0.1 | 0.084 | 0.080 | 0.082 | 0.080 | 0.078 | 0.076 | 0.080 | 0.078 | 0.074 | 0.062 | 0.078 | 0.076 | 0.080 | 0.054 |
| Lognormal(0,0.25) | 50 | 50 | 0.1 | 0.038 | 0.040 | 0.042 | 0.040 | 0.038 | 0.040 | 0.044 | 0.044 | 0.034 | 0.036 | 0.044 | 0.044 | 0.040 | 0.038 |
| Lognormal(0,0.5) | 20 | 20 | 0.3 | 0.046 | 0.042 | 0.050 | 0.050 | 0.048 | 0.050 | 0.046 | 0.050 | 0.050 | 0.042 | 0.052 | 0.040 | 0.048 | 0.050 |
| Lognormal(0,0.5) | 50 | 50 | 0.3 | 0.072 | 0.076 | 0.074 | 0.076 | 0.076 | 0.074 | 0.074 | 0.072 | 0.072 | 0.078 | 0.082 | 0.078 | 0.082 | 0.066 |
| Lognormal(0,0.25) | 20 | 20 | 0.3 | 0.056 | 0.052 | 0.054 | 0.056 | 0.060 | 0.060 | 0.054 | 0.056 | 0.062 | 0.050 | 0.056 | 0.058 | 0.054 | 0.042 |
| Lognormal(0,0.25) | 50 | 50 | 0.3 | 0.044 | 0.050 | 0.046 | 0.046 | 0.040 | 0.040 | 0.052 | 0.044 | 0.040 | 0.046 | 0.060 | 0.046 | 0.058 | 0.048 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
Energy distance and kernel mean embedding for two sample survival test
Marcos Matabuena
Centro de Investigación en Tecnoloxías da Información (CiTIUS),
Universidade de Santiago de Compostela, Santiago de Compostela. Spain
1 Abstract
In this article a new family of tests is proposed for the comparison problem of the equality of distribution of two-sample under right censoring scheme. The tests are based on energy distance and kernels mean embedding, are calibrated by permutations and are consistent against all alternatives. The good performance of the new tests in real situations with finite samples is established with a simulation study.
:
2 Introduction
One of the main objectives of the survival analysis is to compare the distribution of the lifetime of two-sample coming of two different groups. The most popular example of this situation is the case of clinical trials when evaluating the efficacy of two treatments [Singh and Mukhopadhyay, 2011]. Under a context of right censored data, the test most used within the scientific community to contrast the equality between two distribution curves is the logrank-test [Schoenfeld, 1981, Yang and Prentice, 2010, Su and Zhu, 2018] proposed at by Mantel and Haenzel [Mantel and Haenszel, 1959]. This test is known to be the most powerful test when the hazard functions are proportional to each other [Schoenfeld, 1981, Su and Zhu, 2018, Xu et al., 2017]. However, when this hypothesis is violated the test has a significant loss of power [Fleming et al., 1980, Lachin and Foulkes, 1986, Lakatos, 1988, Schoenfeld, 1981].
Currently a hot topic in the medical field is for the hypothesis test to use [Su and Zhu, 2018], due to the lack of statistical power of the log-rank test found in many real case studies [Su and Zhu, 2018]. This is the case of the new oncological treatments where, for example, with new immunotherapy therapies, they have a delayed effect [Melero et al., 2014, Xu et al., 2017, Xu et al., 2018, Su and Zhu, 2018]. Also in the multimodal treatments [Moehler et al., 2007] where it is expected that the density function in many occasions present several mode, or in cases where healing occurs [López-Cheda et al., 2017]. In any of these situations, the hypothesis of proportional risks is strongly unfulfilled.
From a mathematical statistics point of view it is well-known that any test with finite samples has a poor behavior except in a finite number of directions. This means that in real scenarios we have no guarantee that one test will always be better than another. Precisely Janssen [Janssen, 2000] proved that you can not expect to build a test with a high power, except in a space of finite dimension. However, this does not mean that you can not build tests with an acceptable power for a large number of alternatives and in situations of interest, the objective sought by the statistical community in recent decades.
In the literature there are two types of different tests: the directionals and the omnibus. The former seek maximum power in specific directions, while the latter are consistent against all alternatives. The most popular family of directional tests with right censoring is that of the logrank-test [Fleming et al., 1987], to which the statistic of the logrank test is assigned a weight function that determines the optimality in certain directions [Gehan, 1965, Tarone and Ware, 1977, Peto and Peto, 1972, Fleming and Harrington, 1981]. On other occasions, within these tests the results of the individual tests are even combined to construct a global test [Bathke et al., 2009], or the function of weights [Yang and Prentice, 2010] is estimated, but this needs a significant amount of data. The Kolmogorov-Smirnov [Fleming et al., 1980] test and the Cramer-von Mises with censorship on the right [Schumacher, 1984] are two examples of omnibus tests.
The energy distance [Székely, 2003, Székely and Rizzo, 2013] is a statistical distance that measures how many different two probability distributions are. It is based on the calculation of Euclidean distances between pairs of variables and the notion of potential energy, and it has been used among other problems to compare the equal distribution in problems with several samples [Székely and Rizzo, 2004], goodness of fit [Székely and Rizzo, 2005], and cluster analysis [Szekely and Rizzo, 2005]. The main characteristic of this statistic is that it requires minimum hypotheses for its use, only conditions on the moments of the random variables involved. Its multivariate extension is immediate, and the test for the comparison of equality in distribution in problems with several samples presents a high statistical power with known distributions, even in high-dimensional contexts [Székely and Rizzo, 2004], being consistent for all alternatives. The generalization of the test with other types of metrics than the Euclidean ones like the negative type [Lyons et al., 2013, Rachev et al., 2013] is equivalent to the methods kernel [Sejdinovic et al., 2013, Shen and Vogelstein, 2018] proposed in [Gretton et al., 2012] and based on the kernel mean embedding [Muandet et al., 2017].
The main objective of this paper is to extend these tests to a context of right censored data in the univariate case. The structure of the paper is as follows: first we review the main literature of the methods of comparison of equality in the distribution of two-sample with right censoring, then we explain the relationship between energy distance and kernels mean embedding. The statistics are then derived and their theoretical properties of the test are established as the consistency against all alternatives. Finally a simulation study is carried out to compare the behavior of the proposed new methods against the classical tests of the literature. To do this, we will compare the power and error type I using known distributions, in addition to the cases discussed above, with delay, recovery or multimodality, where the log-rank test have less than ideal performance.
3 Previous research
Henceforth, let us consider the traditional framework in the problems of two-sample survival comparison given by the lifetimes and censoring times with distributions y defined in an subset of . As usual, the random variables are assumed to be independent of each other. In practice only the random variables are observed and . We will always assume and , and that the variables are continuous for simplification.
The problem of two-sample that we will study is the following:
[TABLE]
At the maximum times observed for each group we will call them and respectively, and at the minimum of both, .
Next, we will describe the previous main literature on directional and omnibus tests.
3.1 Directional tests: The log-rank test family
In this subsection we will describe the logrank test and its different variants.
The times of failure will be denoted as . We define:
= people in the group who are at risk in .
= people at risk in (in both groups).
people who fail in the group in .
people who fail in .
The statistic has the following structure:
[TABLE]
where is a weighting function that determine the properties of the test, and that depends on the number of people at risk in time , , of the survival function estimated in time , or in the last instant .
Under the null hypothesis , , where denotes the hypergeometric distribution and therefore, it is fulfilled, y ·
The main characteristics of the log-rank test and its variants will be described below:
- •
Log-rank [Mantel and Haenszel, 1959].
The logrank test is optimal when the hazard function of the two groups are proportional. It results from taking . Under the null hypothesis is fulfilled:
[TABLE]
- •
Gehan Generalized Wilcoxon Test [Gehan, 1965]
It is a test of free distribution that is an extension of the Wilconxon test in a context of right-censored. It provides much more weight to the early survival times. For this, it is taken as a function of weights .
- •
Tarone-Ware [Tarone and Ware, 1977]
It is a modification of the Gehan test, whose weight function is , which assigns lower weights than in the Gehan test.
- •
Peto-Peto [Peto and Peto, 1972]
The Peto test is used when the hazard function is not proportional, and the Kaplan-Meier estimator is used in the weight function . The initial times receive more weighting than the more distant observations.
- •
Fleming Harrington family [Fleming and Harrington, 1981]
In the test family Fleming Harrington the function of weights depends on two parameters y that give the test much flexibility. The choice as plug-in of the Kaplan-Meier estimator increases the power of the test [Buyske et al., 2000].
3.2 The omnibus tests
The Kolmogorov Smirnov [Fleming et al., 1980, Schumacher, 1984] and Cramér-von Mises tests [Schumacher, 1984] under right-censored data are the most popular omnibus test. There are several versions of these two tests but some have certain limitations. For example, the direct extension of the Cramér-von Mises test to the censored case, the limit distribution [Koziol, 1978] of the Cramér-von Mises in general can not be calculated. In this subsection we will explain two versions of both tests proposed in [Schumacher, 1984] and based on the comparison of cumulative empirical hazard function.
Suppose and under the conditions of independence assumed in the section on the variables .
To the ordered sample we will call them , , , and we will also refer to the corresponding censorship with respect to induced ordering for observed times , , , .
Denoting by , to the survival functions of the groups [math] and respectively at the time instant and , to its cumulative hazard function, and considering the function
[TABLE]
The comparison problem can be expressed as:
[TABLE]
The function can be estimated by:
[TABLE]
where, , denotes the estimator of Nelson-Aalen
[Nelson, 1972] of each group.
We define:
.
From the previous expressions we can write the following two statistics of the Kolmogorov test
[TABLE]
and also for the Cramér?von Mises test:
[TABLE]
All statistics are consistent against all alternatives, and convergence almost surely to their analogous populations. The limit distribution of Kolmogorov Smirnov tests is in the next Gaussian process where denotes a Brownian standard movement and converge to where is a Brownian bridge. While converge to and to .
For more details consult the following reference [Schumacher, 1984].
4 The energy distance and the kernels mean embedding
In this section we will introduce the energy distance, the RKHS (reproducing kernel Hilbert space) and its relation with the kernels mean embeddings. The explanation will be first at the population level and then at the sample level.
Given the random variables in , and ,, with finite moments of order one , , ,, and where, y denotes its distribution functions. The energy distance [Székely, 2003, Székely and Rizzo, 2013] between the distributions and is defined by:
[TABLE]
where denotes the Euclidean norm.
It can be proved that it is invariant before rotations, in addition, it is non-negative , giving equality to zero, if and only, .
The previous definition of energy distance can be extended for a family of indices [Székely and Rizzo, 2013] (assuming in each case the existence of the moment of order ). In this case, the energy distance is:
[TABLE]
verifying, for all , and giving equality to zero, if and only, . In the particular case with , , and therefore, non-negativity is verified trivially, although in this situation, , implies equality in means and not in distribution between and .
The notion of energy distance can be generalized to even more general spaces. Let where is an arbitrary space with a scalar product induced by a semi-metric of negative type [Rachev et al., 2013, Lyons et al., 2013] , what is required to satisfy:
[TABLE]
where , and each such that . In this case, the pair it is said to be a negative type space [Lyons et al., 2013, Rachev et al., 2013]. Replacing by and by , in expression , we obtain the generalized energy distance for the negative type space :
[TABLE]
In any negative type space there is a hilbert space and an application such that [Rachev et al., 2013, Sejdinovic et al., 2013]. The previous relationship allows calculating the amounts of the distributions on in the associated Hilbert space . In the case does does not satistate the triangular inequality, the function the function verifies the distance axioms.
There is an equivalence [Székely and Rizzo, 2013, Shen and Vogelstein, 2018] between energy distance, commonly used in statistics [Székely and Rizzo, 2013], and the distance defined in the kernels mean embeddings [Gretton et al., 2012], the approach used mostly in the field of machine learning [Gretton et al., 2012]. Before explaining, we are going to introduce some basic concepts of the RKHS. For more information about the RKHS consult the following basic reference [Manton et al., 2015].
Let be the Hilbert space that contains the real variable functions defined above . A function is a reproducing kernel in if it satisfies the following two properties:
2. 2.
and .
The two properties above imply that is a positive definite and symmetric function. The theorem of Moore-Aronszajn [Aronszajn, 1950, Manton et al., 2015] establishes the converse equivalence, if is a symmetric function and positive definite, there is a single reproducing kernel Hilbert space , which has as its reproducing kernel . The application is the so-called canonical feature application. Given a kernel, this theorem provides a method of how to define an embedding of a probability measure in an RKHS space. To do this, just consider the application such that , or equivalently, define .
The notion of distance between two probabilities can be introduced using the inner product of , which, is called measure of maximum discrepancy (MMD) [Gretton et al., 2012] and is given by:
[TABLE]
The above expression [Gretton et al., 2012] can also be written as :
[TABLE]
where , and ,.
The next important result shows that negative-type semimetrics and positive defined kernels are strongly connected [Van Den Berg et al., 1984]. Let and an arbitrarily fixed point. If it is defined:
[TABLE]
Then, it can be shown that is a positive defined kernel if and only is a semimetric of negative type. In this way, we have a family of kernels, one for each election of . Conversely, if is semimetric of negative type and is a kernel in this family, then it is verified:
[TABLE]
Finally using the above equality, along with and can be established the relation between the distance in the kernels mean embedding and the distance of energy in a space of negative type [Sejdinovic et al., 2013]:
[TABLE]
In a sample context, two samples are available , and the unknown quantities and must be estimated. To do this, the empirical distribution is used as a plug-in and the statistical and is used as estimator. That is:
[TABLE]
( statistic energy distance),
[TABLE]
( statistic kernel method),
[TABLE]
( statistic energy distance),
[TABLE]
( statistic kernel method).
where the kernel it has to be characteristic [Sriperumbudur et al., 2011, Gretton et al., 2012, Muandet et al., 2017].
In the table we can see the most known kernels with the property of being characteristic.
In the statistical community we usually use the energy of data with a statistic, which is a biased estimator [Kowalski and Tu, 2008], but which is always greater than or equal to zero [Székely and Rizzo, 2013]. While in the community of machine learning it is obtained by the kernel method with statistics, unbiased estimator [Kowalski and Tu, 2008], with a lower computational cost, but which can take negative values [Gretton et al., 2012].
Assuming moments of at least order in the random variables , , the sample statistic converges almost surely to the population version:
[TABLE]
[TABLE]
The limit distribution of these statistics is derived as a consequence of the central theorems for and statistics in the degenerate case [Korolyuk and Borovskich, 1994] and can be found in the original works [Gretton et al., 2012, Székely and Rizzo, 2004]. However, in practice, to calibrate the tests the boostrap/permutations methods are used [Gretton et al., 2012, Székely and Rizzo, 2004].
5 The proposed tests
In this section, the tests based on energy distance and kernel mean embedding will be extended to a context of right censoring. In this case, unlike the previous section, the statistics will be deducted first and then the theoretical properties will be derived.
5.1 The statistics
As before, let us suppose and under the conditions of independence and regularity assumed in the section on the variables .
For each group we consider their orderly sample , , , and also for the corresponding censored indicators , , , .
In a context of right censoring (under independence), the maximum non parametric likelihood estimator is the Kaplan-Meier [Kaplan and Meier, 1958] estimator instead of the empirical distribution. This estimator is consistent [Wang et al., 1987] and for all , converges asymptotically a normal distribution [Cai, 1998]. One of its main characteristics is its negative bias [Stute, 1994], which if it is a mechanism of censored is high it can become considerable. In [Stute, 1994] in fact, an exact expression is provided for the bias of the Kaplan-Meier integral , where denotes Kaplan-Meier estimator.
If we replace as plug-in, the empirical distribution by the Kaplan-Meier estimator in and , we obtain the statistic for right censored data:
[TABLE]
( statistic energy distance under right censored),
[TABLE]
( statistic kernel method under right censored).
where
[TABLE]
and
[TABLE]
are the Kaplan-Meier integral weights [Stute, 1995].
However, as the limit of each statistic has the following structure:
[TABLE]
[TABLE]
where usually, , are less than the maximum support value of the random variables and due to censorship. and take values in this domain with the same value that initial distribution and , but in general they not are distribution functions in the previous domain of integration.
As a consequence, there is no guarantee that the limit functions and are a function of distance between probability measures. Actually they are not, if is the distribution function of a random variables , of a and , then the value of is negative. It is easy to verify that if and , then the limit is zero, but also we can build an example of two different probability measures with zero distance, so this statistics will not be consistent against all alternatives.
To solve this problem, we have to get , to be distribution functions in the previous integration domain, that is achieved by the previous functions, that is, , and . In addition, for the consistency of the test against all alternatives as we will see later we must impose that in the case that the support of the distribution functions and is not contained in the intervals and respectively.
This leads to consider the statistics under right censored suggested in [Bose and Sen, 1999] and apply the aforementioned standardization for multisample statistic under right censoring [Stute and Wang, 1993]. The corresponding statistics are the following:
[TABLE]
( statistic energy distance under right censoring),
[TABLE]
( statistic kernel method under right censoring).
Finally, we will use the following statistics
[TABLE]
to derive more easily, the consistency against all alternatives.
5.2 Permutation tests
As in the case without censorship, the null distribution of the statistics is calculate with permutation methods. If the censorship mechanism of the two groups is the same, the standard permutation methods are valid [Neuhaus et al., 1993, Wang et al., 2010]. However, when the censoring distributions differ, standard permutation methods do not work well for small-sample settings and/or when the amount of censoring is large [Heimann and Neuhaus, 1998]. In this case, we must use the resampling strategy proposed in [Wang et al., 2010].
We denote by a vector the size that contains the group to which it belongs to each data, and by and to the vectors of the same length that contain the observed times and the censorship indicator of each time. Given a statistic , the first step of traditional permutations method consists in calculate the value of each statistics for each permutation . Resulting each permutation of consider combination over the index in the following way: the values of different possible combinations are distributed to the first group and assigned the remaining index to the other group. Finally, we compare if is less or equal that . The p-value is calculated as follow:
[TABLE]
In practice, only a a small number of permutations is considered in the approximation of the latest expression.
5.3 Theoretical properties
5.3.1 Asymptotic distribution
The theoretical results derived for the asymptotic convergence in distribution under null hypothesis of the statistics will be established only in the proofs for the case of kernel mean embeddings. As we have seen before (equation ) given the equivalence between the tests based on the kermel mean embeddings and the energy distance [Sejdinovic et al., 2013] this is not restrictive.
We first transform each term in the previously sum by centering. Under the null hypothesis and , and we have the same mean embedding
. Thus if we replace each instance of with a kernel which the mean has been subtracted,
[TABLE]
This gives the equivalent of the empirical
[TABLE]
Note that is a degenerate kernel:
[TABLE]
Then, in the terms
[TABLE]
we can apply the limits theorems for statistics under right censored data [Bose and Sen, 2002, Fernández and Rivera, 2018]. In particular we will use the results [Fernández and Rivera, 2018] due to the weakest conditions to apply the theorems, and also, for the conditions that are assumed in this workit is proved in that same work that the theorems of asymptotic convergence are valid.
By the Corollary [Fernández and Rivera, 2018], under the null hyphotesis and we have:
[TABLE]
and
[TABLE]
where , with standard normal random variables and , are two constant specified in [Fernández and Rivera, 2018] that for our purpose are not irrelevant.
The structure of the previous limits coincides with the case without censoring in the degenerate case corresponds to [Korolyuk and Borovskich, 1994] where is a constant.
However, for the term
[TABLE]
which is a U-statistic of two samples under right censored data there are still no theoretical results.
The deduction of the theorems limits with statistics in several samples extends the objectives of this work, and will be presented in another paper. In any case, the limit distribution coincides with the case with censorship. This is
[TABLE]
[TABLE]
where and are two independence sequences of standart normal random variables.
5.3.2 Consistency against all alternatives
Theorem 1**.**
Let be an arbitrary metrics spaces with the same topology defined on with contained on and let be a continuous, symmetric, real function on . Suppose ,, , are independent random variables, , and identically distributed, and , are identically distributed. Suppose , , and have finite expected values on . Then
[TABLE]
if and only if is negative definite and where and denote the distribution of and respectively. If is strictly negative then equality holds if and only if and are identically distributed on .
Proof.
By Theorem [Székely and Rizzo, 2005], it is verified:
[TABLE]
if and only if is negative definite. If is strictly negative then equality holds if and only if and are identically distributed on .
If we define the following random variables on , , with distribution function , respectively as follow :
and , where and , And we consider his copies ,. As , , and have finite expected values on , then , , and have finite expected values on . Moreover, be a continuous, symmetric, real function on .
This leads:
[TABLE]
if and only if is negative definite, and
[TABLE]
if and are identically distributed on (with strictly negative) or equivalent and are equally distributed on .
∎
Theorem 2**.**
Let and with and under the conditions of assumed in the section on the variables . Then:
[TABLE]
[TABLE]
where
P_{0}^{\prime}(x)=\left\{\begin{array}[]{lcc}P_{0}(x)&if&x<\tau_{0}\\ \\ P_{0}(\tau_{0}^{-})+1\{\tau_{0}\in A^{1}\}P_{0}(\tau_{0})&if&x\geq\tau_{0}\\ \end{array}\right.**
and
P_{1}^{\prime}(x)=\left\{\begin{array}[]{lcc}P_{1}(x)&if&x<\tau_{1}\\ \\ P_{1}(\tau_{1}^{-})+1\{\tau_{1}\in A^{1}\}P_{1}(\tau_{1})&if&x\geq\tau_{1}.\\ \end{array}\right.**
Here, , , and .
Proof.
The proof consists of repeatedly applying the strong laws of large numbers for Kaplan Meier statistics with two samples [Stute and Wang, 1993], with the convergence results for statistic of degree two for randomly censored [Bose and Sen, 1999].
By [Stute and Wang, 1993] we know that
[TABLE]
where is a given kernel of degree two such that
[TABLE]
Note that by hypothesis that is continuous distribution function implies that and are empty set and therefore and
Applying the previous result with , along with the properties of convergence in probability, we have:
[TABLE]
Using the theorem of [Bose and Sen, 1999], it is verified also
[TABLE]
and
[TABLE]
.
Finally taking as or and applying the properties of convergence in probability of the sum of two random variables, the desired result is obtained.
∎
Theorem 3**.**
Let and with under the conditions of independence assumed in the section on the variables . Also let’s suppose that or the support of the distribution functions and is contained in the intervals and respectively. Then, the statistics determines a test of the hypothesis of equal distributions that is consistent against all fixed alternatives with continuos random variables.
Proof.
We assume without any restriction that and have the same support (otherwise it is enough to extend the probability measure with less support to the higher one). If we can apply theorem and then we have guaranteed:
[TABLE]
[TABLE]
and giving the equality to zero if and only if
Suppose , then we have strictly inequality in , so with probability one . By the theory of degenerate -statistics under the null hyphotesis there exists a constants and satisfying
[TABLE]
Under the alternative hypothesis
[TABLE]
since and with probabiliy one as .
In the case the support of the distribution functions and is contained in the intervals and and in this situation the normalization constants are , and then, the previous argument is going to be true.
∎
6 Simulation study
The simulation study is divided into two phases. In the first, the performance of the new tests proposed under the null hypothesis is compared with the logrank family tests with different censorship rates and different sample size. In particular, the tests used are the energy distance (with ), gaussian kernel ), laplacian kernel ), rational quadratic ( and ), log-rank, Gehan generalized Wilcoxon test, Tarone-Ware, Peto-Peto, Fleming Harrington (with ). For this purpose, parametric distributions such as normal, exponential or lognormal are used. In the second phase, the same tests are compared where the null hypothesis is not true, in different scenarios: proportional hazard ratio, cure, multimodality, and delayed effect.We use different censorship mechanisms for each case and we vary the sample size ().
All the tests are executed on the statistical software R. For the family of the logrank test the coin package [Hothorn et al., 2008] is used, while the new tests have been implemented in C++, and integrating them in R with the Rcpp [Eddelbuettel et al., 2011], and Rcpp Armadillo libraries. In both cases the tests are calibrated by the permutations method, performing repetitions for our tests.
6.1 Null hyphotesis
We simulate times two samples in which the null hypothesis is correct. The censoring rates are and percent, and the sample size of and individuals. As under the null hypothesis
p-value , the mean of the p-values obtained should be close to , and the Standard deviation . Likewise, approximately the percent of the observations should have a value less than . In Table 3 we can see the results of calculating the mean and standard deviation for each test and case study contemplated, while in Table 4 shows the proportion of p values that are less or equal than in the same cases.
[FIGURE:]
[FIGURE:]
The results shown of the new tests proposed under the null hypothesis are consistent and similar to those of the logrank test family. Note that it is normal that there are certain discrepancies with the theoretical values when doing the comparison with repetitions, in different tests. In turn, the Kaplan-Meier estimator used in our models and in some of the logrank family presents a certain bias (dependent on the censoring ratio), which produces small deviations under what is expected in a theoretical framework under the null hypothesis.
6.2 Alternative hyphotesis
As before, we simulate repetitions of two samples, but this time the null hypothesis is unfulfilled. The cases we studied are the following: the hazard ratio is proportional between two populations (the logrank test is the most powerful test in this context), healing occurs in a one population, in a population the density function has several modes as a consequence of a multimodal treatment, there is delayed effects in a population. The sample size vary by , and people in each group and the censoring mechanics change between experiments. The significance level of is used as the cutoff for significance.
In each figure for each subcase we represent four graphs: In the first one, the power of the tests of the energy of data, in the second of the kernel methods, in the third of the logrank test together with the other family methods, and in the last, the logrank test, the average power of the energy of data tests, of the kernel methods, and of the family logrank test.
6.2.1 Proportional hazard ratio in two population
We simulate times varying the sample sizes with individuals from each group, and , in the following cases of study: versus .
We representate the results based on the variation of the parameter for each sample size in different figures. In figure 1 we show the results for , in figure 2, for , and finally in figure 3 for . As we can see in the three figures, the logrank test is usually the most powerful test, as is logical in the situation where this test is optimal from a theoretical point of view. However, the average of the results obtained by the distance of energy is not far in statistical power. We can also appreciate that the selection of the parameters of both the energy distance and the kernel methods leads to more or less power for this case study, which gives great flexibility to the family of tests.
6.2.2 Cure
We simulate data with the next predefined hazard ratio function for each population on :
[TABLE]
and
[TABLE]
The censoring times . In the figure 4 we can see the graphical representation of the survival function resulting from calculating the Kaplan-Meier estimator using subjects of each group. Figure 5 collects the results of the power study, where it can be seen that in this case the most powerful tests are those given by energy distance and kernel methods. It is curious that in the tests of these two families there is hardly any variability between the tests studied, however this is not the case in the family logrank test where there are many differences between the different tests.
6.2.3 Multimodality
We simulate data also with default hazard ratio function for each population on :
[TABLE]
and
[TABLE]
The censoring times . The figure 6 show the Kaplan Meier estimator of each group. In this case of study, the family of the logrank test has more power (figure 7) than the chosen tests based on energy distance and the kernel method. In turn, there is much discrepancy in the power achieved in many tests of this family, with some of them like Fleming having less power than the new tests proposed.
6.2.4 Delayed effect
We consider the next hazard ratio functions for each population:
[TABLE]
and
[TABLE]
The censoring variables . The Kaplan Meier estimator is shown in the figure 8.
In this last simulation, the methods based on kernel methods are the most powerful by far (figure 9). The power achieved by the log rank family tests and the energy distance is similar. However, the power of the log rank test is very low, with hardly any greater detection capacity than under the null hypothesis. In addition, this also occurs at the energy distance for and , which shows that the appropriate parameter selection is necessary for the correct use of these tests.
7 Final remarks
In this article a new statistics for testing the equality of survival distributions with censored data are proposed. The tests are consistent against all alternatives and with finite samples in situations of great clinical interest, such as the new oncological treatments where the new pharmacological strategies consist of introducing a delay effect [Melero et al., 2014, Xu et al., 2017] in the new drugs, greatly exceeding the performance of the classic tests if we select the correct parameters. In the other situations analyzed, the performance is higher as in the case of the study of healing, very close to the optimum when the hazard ratio is constant and slightly worse in the case of simulated multimodal treatments. In general, the performance is better than the classic tests, however there are certain issues such as the choice of optimal parameters or kernels in each situation that are still unresolved (It also happens in the uncensored case [Szekely and Rizzo, 2017]). In addition, in the analysis of survival when estimating the mean [Datta, 2005], it is common to consider the Efron correction [Efron, 1967] that consists of considering that the maximum time observed in each group is uncensored (), or resorting to other imputation techniques with censored observations, both for the estimation of the mean [Datta, 2005], or in the global estimation of the weights of the Kaplan Meier estimator such as presmoothed [Cao and Jácome, 2004]. In any case, this may increase the power of the tests, but also increase the bias.
The extension of the tests proposed with -samples is analogous to the case without censorship, in which there is a variety of literature such as Disco analysis [Rizzo et al., 2010], extension of the ANOVA test to testing the equality distribution in an uncensored context, or more recent the kernel methods proposed method in [Balogoun et al., 2018] .
Soon on my github at https://github.com/mmatabuena will appear a R package called energysurv with the proposed methods implemented in C++ in which the scientific community could use the new tests as a valuable alternative to classical survival tests.
Graphics
8 Acknowledgements*
This work has received financial support from the Consellería de Cultura, Educación e Ordenación Universitaría (accreditation 2016-2019, ED431G/08) and the European Regional Development Fund (ERDF).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Aronszajn, 1950] Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American mathematical society , 68(3):337–404.
- 2[Balogoun et al., 2018] Balogoun, A. S. K., Nkiet, G. M., and Ogouyandjou, C. (2018). Kernel based method for the k 𝑘 k -sample problem. ar Xiv preprint ar Xiv:1812.00100 .
- 3[Bathke et al., 2009] Bathke, A., Kim, M.-O., and Zhou, M. (2009). Combined multiple testing by censored empirical likelihood. Journal of Statistical Planning and Inference , 139(3):814–827.
- 4[Bose and Sen, 1999] Bose, A. and Sen, A. (1999). The strong law of large numbers for kaplan–meier u-statistics. Journal of Theoretical Probability , 12(1):181–200.
- 5[Bose and Sen, 2002] Bose, A. and Sen, A. (2002). Asymptotic distribution of the kaplan–meier u-statistics. Journal of multivariate analysis , 83(1):84–123.
- 6[Buyske et al., 2000] Buyske, S., Fagerstrom, R., and Ying, Z. (2000). A class of weighted log-rank tests for survival data when the event is rare. Journal of the American Statistical Association , 95(449):249–258.
- 7[Cai, 1998] Cai, Z. (1998). Asymptotic properties of kaplan-meier estimator for censored dependent data. Statistics & probability letters , 37(4):381–389.
- 8[Cao and Jácome, 2004] Cao, R. and Jácome, M. (2004). Presmoothed kernel density estimator for censored data. Nonparametric Statistics , 16(1-2):289–309.
