Energy distance and kernel mean embedding for two sample survival test

Marcos Matabuena

arXiv:1901.00833·math.ST·January 4, 2019

Energy distance and kernel mean embedding for two sample survival test

Marcos Matabuena

PDF

Open Access

TL;DR

This paper introduces new statistical tests for comparing two survival distributions under right censoring, utilizing energy distance and kernel mean embedding, with permutation calibration and proven consistency.

Contribution

It proposes a novel family of two-sample tests specifically designed for censored survival data, combining energy distance and kernel methods with permutation calibration.

Findings

01

Tests perform well in finite sample simulations

02

They are consistent against all alternatives

03

Effective in real survival analysis scenarios

Abstract

In this article a new family of tests is proposed for the comparison problem of the equality of distribution of two-sample under right censoring scheme. The tests are based on energy distance and kernels mean embedding, are calibrated by permutations and are consistent against all alternatives. The good performance of the new tests in real situations with finite samples is established with a simulation study.

Tables4

Table 1. Table 1: Possible events on time τ k subscript 𝜏 𝑘 \tau_{k}

	Group $0$	Group $1$	Total
Number of live subjects	$Y_{0} (τ_{j})$	$Y_{1} (τ_{j})$	$Y (τ_{j})$
Number of subjects that die	$d_{0 j}$	$d_{1 j}$	$d_{j}$

Table 2. Table 2: Characteristics kernels . Γ ( ⋅ ) Γ ⋅ \Gamma(\cdot) denote Gamma function and K v subscript 𝐾 𝑣 K_{v} is the modified Bessel function of the second kind of order v 𝑣 v .

Kernel Function	$k (x, y)$
Gaussian	$\exp (- σ {‖ x - y ‖}^{2}), σ > 0$
Laplacian	$\exp (- σ \| x - y \|), σ > 0$
Rational quadratic	${(‖ x - y ‖ + c)}^{- β}$ , $β, α > 0$
Mattern	$\frac{2^{1 - v}}{Γ (v)} (\frac{\sqrt{2 v} ‖ x - y ‖}{σ}) K_{v} (\frac{\sqrt{2 v} ‖ x - y ‖}{σ})$

Table 3. Table 3: Empirical mean and standard deviation of p-values for each case of study under the null hypothesis.

Method:				Energy distance	Energy distance	Energy distance	Energy distance	Energy distance	Kernel	Kernel	Kernel	Kernel	Logrank	Gehan	Tarone	Peto	Flemming
				$α = 1$	$α = 0.4$	$α = 0.8$	$α = 1.2$	$α = 1.6$	Gaussian $σ = 1$	$L a p l a c i a n$ $σ = 1$	Quadratic $c = 1, β = 1$	Quadratic $c = 2, β = 2$					$ρ = 1, γ = 1$
Comparative	$n_{1}$	$n_{2}$	Censoring rate	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} s d$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$	$\bar{x} σ$
Exp(1)	20	20	0.1	0.496 0.286	0.493 0.283	0.495 0.285	0.497 0.287	0.499 0.289	0.495 0.287	0.493 0.284	0.493 0.288	0.497 0.286	0.508 0.291	0.505 0.288	0.503 0.285	0.502 0.285	0.495 0.296
Exp(1)	50	50	0.1	0.482 0.293	0.478 0.290	0.481 0.293	0.483 0.292	0.485 0.289	0.481 0.298	0.478 0.294	0.479 0.296	0.482 0.297	0.492 0.293	0.489 0.295	0.478 0.280	0.486 0.292	0.490 0.295
Exp(1.5)	20	20	0.1	0.482 0.287	0.490 0.285	0.484 0.286	0.480 0.287	0.477 0.288	0.489 0.290	0.49 0.285	0.493 0.288	0.483 0.289	0.471 0.296	0.486 0.289	0.475 0.291	0.481 0.288	0.458 0.288
Exp(1.5)	50	50	0.1	0.482 0.295	0.475 0.288	0.481 0.293	0.483 0.296	0.484 0.297	0.485 0.293	0.481 0.289	0.487 0.289	0.483 0.295	0.492 0.299	0.501 0.295	0.492 0.295	0.498 0.295	0.479 0.293
Exp(1)	20	20	0.3	0.508 0.288	0.509 0.288	0.508 0.288	0.507 0.288	0.508 0.290	0.503 0.285	0.506 0.287	0.502 0.286	0.504 0.287	0.495 0.284	0.502 0.297	0.500 0.294	0.499 0.295	0.507 0.286
Exp(1)	50	50	0.3	0.494 0.297	0.496 0.295	0.494 0.297	0.494 0.295	0.494 0.291	0.493 0.297	0.495 0.297	0.494 0.298	0.493 0.297	0.503 0.296	0.486 0.297	0.491 0.296	0.486 0.297	0.502 0.287
Exp(1.5)	20	20	0.3	0.500 0.290	0.510 0.293	0.503 0.291	0.498 0.289	0.495 0.288	0.492 0.284	0.506 0.292	0.498 0.288	0.492 0.284	0.497 0.295	0.499 0.289	0.493 0.285	0.495 0.286	0.501 0.292
Exp(1.5)	50	50	0.3	0.489 0.301	0.487 0.297	0.488 0.300	0.489 0.301	0.490 0.299	0.489 0.301	0.486 0.299	0.486 0.301	0.490 0.302	0.496 0.298	0.492 0.294	0.495 0.299	0.492 0.294	0.500 0.299
Gamma(1,1)	20	20	0.1	0.501 0.294	0.508 0.297	0.503 0.295	0.499 0.293	0.493 0.288	0.512 0.297	0.508 0.296	0.511 0.296	0.505 0.295	0.491 0.284	0.510 0.294	0.498 0.288	0.506 0.292	0.493 0.282
Gamma(1,1)	50	50	0.1	0.503 0.291	0.504 0.288	0.504 0.289	0.503 0.293	0.502 0.297	0.512 0.292	0.508 0.288	0.511 0.290	0.511 0.293	0.505 0.292	0.508 0.287	0.505 0.290	0.508 0.288	0.502 0.290
Gamma(1.5,1.5)	20	20	0.1	0.519 0.295	0.516 0.289	0.519 0.294	0.520 0.296	0.522 0.295	0.515 0.301	0.516 0.295	0.515 0.299	0.516 0.299	0.52 0.290	0.519 0.289	0.522 0.291	0.516 0.287	0.509 0.287
Gamma(1.5,1.5)	50	50	0.1	0.499 0.290	0.493 0.291	0.497 0.290	0.501 0.289	0.506 0.287	0.494 0.289	0.495 0.292	0.493 0.291	0.498 0.289	0.515 0.295	0.505 0.289	0.509 0.289	0.506 0.288	0.505 0.291
Gamma(1,1)	20	20	0.3	0.477 0.288	0.485 0.288	0.479 0.288	0.475 0.289	0.474 0.289	0.479 0.289	0.484 0.287	0.484 0.288	0.475 0.289	0.477 0.297	0.467 0.288	0.464 0.288	0.463 0.287	0.489 0.292
Gamma(1,1)	50	50	0.3	0.489 0.293	0.497 0.296	0.491 0.294	0.486 0.290	0.482 0.287	0.495 0.289	0.497 0.293	0.495 0.288	0.492 0.291	0.485 0.292	0.513 0.300	0.498 0.293	0.511 0.300	0.474 0.287
Gamma(1.5,1.5)	20	20	0.3	0.491 0.293	0.494 0.293	0.492 0.294	0.491 0.293	0.489 0.293	0.493 0.294	0.494 0.294	0.494 0.295	0.492 0.293	0.484 0.297	0.499 0.294	0.490 0.295	0.494 0.292	0.484 0.294
Gamma(1.5,1.5)	50	50	0.3	0.495 0.295	0.489 0.294	0.493 0.294	0.496 0.295	0.499 0.293	0.492 0.293	0.49 0.295	0.491 0.294	0.492 0.294	0.509 0.289	0.49 0.291	0.493 0.288	0.489 0.292	0.514 0.288
Lognormal(0,0.5)	20	20	0.1	0.49 0.287	0.495 0.288	0.492 0.287	0.489 0.288	0.488 0.290	0.49 0.283	0.493 0.287	0.489 0.284	0.490 0.285	0.472 0.279	0.477 0.287	0.470 0.285	0.473 0.286	0.483 0.287
Lognormal(0,0.5)	50	50	0.10	0.503 0.283	0.506 0.284	0.504 0.283	0.503 0.283	0.502 0.282	0.500 0.283	0.506 0.285	0.505 0.285	0.500 0.283	0.508 0.283	0.504 0.279	0.508 0.286	0.504 0.281	0.515 0.296
Lognormal(0,0.25)	20	20	0.1	0.481 0.294	0.484 0.300	0.482 0.295	0.48 0.292	0.480 0.292	0.481 0.294	0.482 0.296	0.480 0.293	0.481 0.295	0.484 0.291	0.476 0.295	0.473 0.291	0.471 0.293	0.487 0.290
Lognormal(0,0.25)	50	50	0.1	0.517 0.289	0.512 0.287	0.516 0.288	0.518 0.290	0.519 0.293	0.517 0.291	0.516 0.288	0.515 0.288	0.518 0.292	0.517 0.292	0.523 0.291	0.522 0.293	0.522 0.291	0.506 0.284
Lognormal(0,0.0.5)	20	20	0.3	0.495 0.288	0.498 0.287	0.496 0.288	0.493 0.288	0.489 0.287	0.495 0.287	0.497 0.287	0.496 0.284	0.493 0.289	0.495 0.285	0.489 0.288	0.488 0.287	0.485 0.286	0.516 0.289
Lognormal(0,0.5)	50	50	0.3	0.482 0.293	0.490 0.297	0.485 0.295	0.480 0.292	0.476 0.291	0.482 0.294	0.488 0.297	0.484 0.294	0.480 0.294	0.476 0.296	0.473 0.293	0.468 0.287	0.47 0.292	0.487 0.296
Lognormal(0,0.25)	20	20	0.3	0.522 0.293	0.513 0.289	0.519 0.292	0.523 0.295	0.525 0.297	0.526 0.298	0.519 0.292	0.526 0.298	0.526 0.298	0.516 0.293	0.526 0.306	0.524 0.303	0.524 0.306	0.518 0.286
Lognormal(0,0.25)	50	50	0.3	0.504 0.291	0.508 0.287	0.505 0.290	0.504 0.292	0.504 0.295	0.501 0.296	0.504 0.29	0.499 0.294	0.502 0.296	0.491 0.289	0.500 0.296	0.496 0.295	0.498 0.295	0.494 0.289

Table 4. Table 4: Proportion p-values less or equal 0.05 0.05 0.05 for each case of study under the null hypothesis.

Method:				Energy distance	Energy distance	Energy distance	Energy distance	Energy distance	Kernel	Kernel	Kernel	Kernel	Logrank	Gehan	Tarone	Peto	Flemming
				$α = 1$	$α = 0.4$	$α = 0.8$	$α = 1.2$	$α = 1.6$	Gaussian $σ = 1$	$L a p l a c i a n$ $σ = 1$	Quadratic $c = 1, β = 1$	Quadratic $c = 2, β = 2$					$ρ = 1, γ = 1$
Comparative	$n_{1}$	$n_{2}$	Censoring rate	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$	$\hat{p}$
Exp(1)	20	20	0.1	0.048	0.048	0.048	0.048	0.050	0.052	0.046	0.050	0.054	0.050	0.046	0.046	0.044	0.058
Exp(1)	50	50	0.1	0.056	0.060	0.054	0.058	0.056	0.052	0.056	0.056	0.056	0.060	0.066	0.064	0.064	0.056
Exp(1.5)	20	20	0.1	0.066	0.056	0.070	0.066	0.066	0.072	0.058	0.064	0.066	0.066	0.058	0.062	0.058	0.062
Exp(1.5)	50	50	0.1	0.042	0.048	0.042	0.042	0.046	0.048	0.044	0.044	0.042	0.062	0.056	0.050	0.056	0.054
Exp(1)	20	20	0.3	0.058	0.058	0.060	0.060	0.060	0.005	0.042	0.042	0.056	0.058	0.054	0.056	0.056	0.056
Exp(1)	50	50	0.3	0.056	0.056	0.056	0.056	0.048	0.054	0.050	0.056	0.050	0.050	0.052	0.054	0.058	0.046
Exp(1.5)	20	20	0.3	0.058	0.048	0.056	0.054	0.052	0.052	0.052	0.048	0.056	0.054	0.054	0.050	0.052	0.052
Exp(1.5)	50	50	0.3	0.064	0.048	0.062	0.064	0.074	0.064	0.044	0.054	0.056	0.066	0.068	0.066	0.068	0.056
Gamma(1,1)	20	20	0.3	0.058	0.056	0.060	0.056	0.054	0.052	0.060	0.060	0.052	0.054	0.054	0.056	0.058	0.054
Gamma(1,1)	50	50	0.1	0.044	0.042	0.044	0.042	0.044	0.042	0.040	0.038	0.042	0.038	0.038	0.030	0.032	0.050
Gamma(1.5,1.5)	20	20	0.1	0.062	0.058	0.062	0.066	0.066	0.060	0.060	0.056	0.064	0.046	0.048	0.048	0.048	0.062
Gamma(1.5,1.5)	50	50	0.1	0.050	0.054	0.052	0.048	0.048	0.054	0.052	0.050	0.052	0.046	0.044	0.046	0.044	0.050
Gamma(1,1)	20	20	0.3	0.058	0.058	0.062	0.058	0.058	0.064	0.054	0.060	0.062	0.056	0.060	0.058	0.052	0.066
Gamma(1,1)	50	50	0.3	0.058	0.060	0.060	0.060	0.056	0.056	0.062	0.060	0.054	0.054	0.052	0.058	0.050	0.046
Gamma(1.5,1.5)	20	20	0.3	0.068	0.056	0.066	0.068	0.066	0.070	0.056	0.060	0.070	0.058	0.060	0.064	0.062	0.066
Gamma(1.5,1.5)	50	50	0.3	0.056	0.060	0.058	0.054	0.052	0.056	0.068	0.064	0.066	0.050	0.062	0.060	0.062	0.050
Lognormal(0,0.5)	20	20	0.1	0.050	0.046	0.050	0.046	0.042	0.052	0.054	0.058	0.044	0.052	0.044	0.044	0.042	0.048
Lognormal(0,0.5)	50	50	0.1	0.040	0.040	0.036	0.040	0.042	0.038	0.040	0.040	0.040	0.040	0.034	0.040	0.036	0.040
Lognormal(0,0.25)	20	20	0.1	0.084	0.080	0.082	0.080	0.078	0.076	0.080	0.078	0.074	0.062	0.078	0.076	0.080	0.054
Lognormal(0,0.25)	50	50	0.1	0.038	0.040	0.042	0.040	0.038	0.040	0.044	0.044	0.034	0.036	0.044	0.044	0.040	0.038
Lognormal(0,0.5)	20	20	0.3	0.046	0.042	0.050	0.050	0.048	0.050	0.046	0.050	0.050	0.042	0.052	0.040	0.048	0.050
Lognormal(0,0.5)	50	50	0.3	0.072	0.076	0.074	0.076	0.076	0.074	0.074	0.072	0.072	0.078	0.082	0.078	0.082	0.066
Lognormal(0,0.25)	20	20	0.3	0.056	0.052	0.054	0.056	0.060	0.060	0.054	0.056	0.062	0.050	0.056	0.058	0.054	0.042
Lognormal(0,0.25)	50	50	0.3	0.044	0.050	0.046	0.046	0.040	0.040	0.052	0.044	0.040	0.046	0.060	0.046	0.058	0.048

Equations138

H_{0} : P_{0} (t) = P_{1} (t) \forall t > 0 versus H_{a} : \exists t > 0 such that P_{0} (t) \neq = P_{1} (t) .

H_{0} : P_{0} (t) = P_{1} (t) \forall t > 0 versus H_{a} : \exists t > 0 such that P_{0} (t) \neq = P_{1} (t) .

\hat{Z}^{2} = \frac{[ \sum _{j = 1}^{k} ω _{j} ( d _{1 j} - E ( d _{1 j} )) ] ^{2}}{\sum _{j = 1}^{k} V a r ( ω _{j} ( d _{1 j} - E ( d _{1 j} )))}

\hat{Z}^{2} = \frac{[ \sum _{j = 1}^{k} ω _{j} ( d _{1 j} - E ( d _{1 j} )) ] ^{2}}{\sum _{j = 1}^{k} V a r ( ω _{j} ( d _{1 j} - E ( d _{1 j} )))}

\hat{Z}^{2} = \frac{[ \sum _{j = 1}^{k} ( d _{1 j} - E ( d _{1 j} )) ] ^{2}}{\sum _{j = 1}^{k} V a r ( d _{1 j} - E ( d _{1 j} ))} \to d χ_{1}^{2} .

\hat{Z}^{2} = \frac{[ \sum _{j = 1}^{k} ( d _{1 j} - E ( d _{1 j} )) ] ^{2}}{\sum _{j = 1}^{k} V a r ( d _{1 j} - E ( d _{1 j} ))} \to d χ_{1}^{2} .

ϵ (t) = Λ_{1} (t) - Λ_{0} (t) = - lo g (S_{1} (t)) + lo g (S_{0} (t))) .

ϵ (t) = Λ_{1} (t) - Λ_{0} (t) = - lo g (S_{1} (t)) + lo g (S_{0} (t))) .

H_{0} : ϵ (t) = 0 \forall t > 0 versus H_{a} : \exists t > 0 such that ϵ (t) \neq = 0.

H_{0} : ϵ (t) = 0 \forall t > 0 versus H_{a} : \exists t > 0 such that ϵ (t) \neq = 0.

\overset{ϵ}{^} (t) = \hat{Λ}_{1} (t) - \hat{Λ}_{0} (t)

\overset{ϵ}{^} (t) = \hat{Λ}_{1} (t) - \hat{Λ}_{0} (t)

Q_{K S ϵ} = (n_{0} + n_{1}) 0 \leq t \leq τ sup ∣ \overset{ϵ}{^} (t) \hat{ψ_{ϵ}} (t) ∣ Q_{K S ϵ}^{0} = (n_{0} + n_{1}) 0 \leq t \leq τ sup ∣ \overset{ϵ}{^} (t) \hat{ψ}_{ϵ}^{0} (t) ∣,

Q_{K S ϵ} = (n_{0} + n_{1}) 0 \leq t \leq τ sup ∣ \overset{ϵ}{^} (t) \hat{ψ_{ϵ}} (t) ∣ Q_{K S ϵ}^{0} = (n_{0} + n_{1}) 0 \leq t \leq τ sup ∣ \overset{ϵ}{^} (t) \hat{ψ}_{ϵ}^{0} (t) ∣,

Q_{C M ϵ} = ((n_{0} + n_{1}) / \hat{A} (τ)) \int_{0}^{τ} (\overset{ϵ}{^} (t) \hat{ψ_{ϵ}} (t))^{2} d \hat{A} (t) Q_{C M ϵ}^{0} = (n_{0} + n_{1}) \int_{0}^{τ} (\overset{ϵ}{^} (t) \hat{ψ_{ϵ}^{0}} (t))^{2} d \hat{H} (t) .

Q_{C M ϵ} = ((n_{0} + n_{1}) / \hat{A} (τ)) \int_{0}^{τ} (\overset{ϵ}{^} (t) \hat{ψ_{ϵ}} (t))^{2} d \hat{A} (t) Q_{C M ϵ}^{0} = (n_{0} + n_{1}) \int_{0}^{τ} (\overset{ϵ}{^} (t) \hat{ψ_{ϵ}^{0}} (t))^{2} d \hat{H} (t) .

ϵ (P, Q) = 2 E ∣∣ X - Y ∣∣ - E ∣∣ X - X^{^{'}} ∣∣ - E ∣∣ Y - Y^{^{'}} ∣∣,

ϵ (P, Q) = 2 E ∣∣ X - Y ∣∣ - E ∣∣ X - X^{^{'}} ∣∣ - E ∣∣ Y - Y^{^{'}} ∣∣,

ϵ_{α} (P, Q) = 2 E ∣∣ X - Y ∣ ∣^{α} - E ∣∣ X - X^{^{'}} ∣ ∣^{α} - E ∣∣ Y - Y^{^{'}} ∣ ∣^{α} .

ϵ_{α} (P, Q) = 2 E ∣∣ X - Y ∣ ∣^{α} - E ∣∣ X - X^{^{'}} ∣ ∣^{α} - E ∣∣ Y - Y^{^{'}} ∣ ∣^{α} .

i, j = 1 \sum n c_{i} c_{j} ρ (X_{i}, X_{j}) \leq 0

i, j = 1 \sum n c_{i} c_{j} ρ (X_{i}, X_{j}) \leq 0

ϵ_{α} (P, Q) = 2 E ρ (X, Y) - E ρ (X, X^{'}) - E ρ (Y, Y^{'}) .

ϵ_{α} (P, Q) = 2 E ρ (X, Y) - E ρ (X, X^{'}) - E ρ (Y, Y^{'}) .

γ_{K} (P . Q) = ∣∣ h_{P} - h_{Q} ∣ ∣_{H_{K}} .

γ_{K} (P . Q) = ∣∣ h_{P} - h_{Q} ∣ ∣_{H_{K}} .

γ_{K}^{2} (P, Q) = E (K (X, X^{^{'}})) + E (K (Y, Y^{^{'}})) - 2 E (K (X, Y))

γ_{K}^{2} (P, Q) = E (K (X, X^{^{'}})) + E (K (Y, Y^{^{'}})) - 2 E (K (X, Y))

K (x, y) = \frac{1}{2} [ρ (x, x_{0}) + ρ (y, x_{0}) - ρ (x, y)] .

K (x, y) = \frac{1}{2} [ρ (x, x_{0}) + ρ (y, x_{0}) - ρ (x, y)] .

ρ (x, y) = K (x, x) + K (y, y) - 2 K (x, y) = ∣∣ h_{x} - h_{y} ∣ ∣_{H_{k}}^{2}

ρ (x, y) = K (x, x) + K (y, y) - 2 K (x, y) = ∣∣ h_{x} - h_{y} ∣ ∣_{H_{k}}^{2}

ϵ (P, Q) = 2 [E (K (X, X^{'}) + E K (Y, Y^{'}) - 2 E K (X, Y)] = 2 γ_{K}^{2} (P, Q) .

ϵ (P, Q) = 2 [E (K (X, X^{'}) + E K (Y, Y^{'}) - 2 E K (X, Y)] = 2 γ_{K}^{2} (P, Q) .

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} ∣∣ X_{0 i} - X_{1 j} ∣∣ - \frac{1}{n _{0}^{2}} i = 1 \sum n_{0} j = 1 \sum n_{0} ∣∣ X_{0 i} - X_{0 j} ∣∣ - \frac{1}{n _{1}^{2}} i = 1 \sum n_{1} j = 1 \sum n_{1} ∣∣ X_{1 i} - X_{1 j} ∣∣

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} ∣∣ X_{0 i} - X_{1 j} ∣∣ - \frac{1}{n _{0}^{2}} i = 1 \sum n_{0} j = 1 \sum n_{0} ∣∣ X_{0 i} - X_{0 j} ∣∣ - \frac{1}{n _{1}^{2}} i = 1 \sum n_{1} j = 1 \sum n_{1} ∣∣ X_{1 i} - X_{1 j} ∣∣

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} K (X_{0 i}, X_{1 j}) - \frac{1}{n _{0}^{2}} i = 1 \sum n_{0} j = 1 \sum n_{0} K (X_{0 i}, X_{0 j}) + \frac{1}{n _{1}^{2}} i = 1 \sum n_{1} j = 1 \sum n_{1} K (X_{1 i}, X_{1 j}) .

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} K (X_{0 i}, X_{1 j}) - \frac{1}{n _{0}^{2}} i = 1 \sum n_{0} j = 1 \sum n_{0} K (X_{0 i}, X_{0 j}) + \frac{1}{n _{1}^{2}} i = 1 \sum n_{1} j = 1 \sum n_{1} K (X_{1 i}, X_{1 j}) .

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = \frac{1}{n _{0} ( n _{0} - 1 )} i = 1 \sum n_{0} i \neq = j \sum n_{0} ∣∣ X_{0 i} - X_{0 j} ∣∣ - \frac{1}{n _{1} ( n _{1} - 1 )} i = 1 \sum n_{1} j \neq = i \sum n_{1} ∣∣ X_{1 i} - X_{1 j} ∣∣ - \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} ∣∣ X_{0 i} - X_{1 j} ∣∣

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = \frac{1}{n _{0} ( n _{0} - 1 )} i = 1 \sum n_{0} i \neq = j \sum n_{0} ∣∣ X_{0 i} - X_{0 j} ∣∣ - \frac{1}{n _{1} ( n _{1} - 1 )} i = 1 \sum n_{1} j \neq = i \sum n_{1} ∣∣ X_{1 i} - X_{1 j} ∣∣ - \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} ∣∣ X_{0 i} - X_{1 j} ∣∣

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = \frac{1}{n _{0} ( n _{0 i} - 1 )} i = 1 \sum n_{0} i \neq = j \sum n_{0} K (X_{0 i}, X_{0 j}) + \frac{1}{n _{1} ( n _{1} - 1 )} i = 1 \sum n_{1} j \neq = i \sum n_{1} K (X_{1 i}, X_{1 j}) - \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} K (X_{0 i}, X_{1 j}) .

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = \frac{1}{n _{0} ( n _{0 i} - 1 )} i = 1 \sum n_{0} i \neq = j \sum n_{0} K (X_{0 i}, X_{0 j}) + \frac{1}{n _{1} ( n _{1} - 1 )} i = 1 \sum n_{1} j \neq = i \sum n_{1} K (X_{1 i}, X_{1 j}) - \frac{2}{n _{0} n _{1}} i = 1 \sum n_{0} j = 1 \sum n_{1} K (X_{0 i}, X_{1 j}) .

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty ϵ (P_{0}, P_{1}) α \in (0, 2),

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty ϵ (P_{0}, P_{1}) α \in (0, 2),

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty γ_{K}^{2} (P_{0}, P_{1}) .

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty γ_{K}^{2} (P_{0}, P_{1}) .

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = 2 i = 1 \sum n_{0} j = 1 \sum n_{1} W_{i : n_{0}}^{0} W_{i : n_{1}}^{1} ∣∣ X_{0 i} - X_{1 j} ∣ ∣^{α} - i = 1 \sum n_{0} j = 1 \sum n_{0} W_{i : n_{0}}^{0} W_{j : n_{0}}^{0} ∣∣ X_{0 i} - X_{0 j} ∣ ∣^{α}

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) = 2 i = 1 \sum n_{0} j = 1 \sum n_{1} W_{i : n_{0}}^{0} W_{i : n_{1}}^{1} ∣∣ X_{0 i} - X_{1 j} ∣ ∣^{α} - i = 1 \sum n_{0} j = 1 \sum n_{0} W_{i : n_{0}}^{0} W_{j : n_{0}}^{0} ∣∣ X_{0 i} - X_{0 j} ∣ ∣^{α}

- i = 1 \sum n_{1} j = 1 \sum n_{1} W_{i : n_{1}}^{1} W_{j : n_{1}}^{1} ∣∣ X_{1 i} - X_{1 j} ∣ ∣^{α},

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = i = 1 \sum n_{0} j = 1 \sum n_{0} W_{i : n_{0}}^{0} W_{j : n_{0}}^{0} K (X_{0 i}, X_{0 j}) + i = 1 \sum n_{1} j = 1 i \sum n_{1} W_{i : n_{1}}^{1} W_{j : n_{1}}^{1} K (X_{1 i}, X_{1 j})

\overset{γ}{^}_{K}^{2} (P_{0}, P_{1}) = i = 1 \sum n_{0} j = 1 \sum n_{0} W_{i : n_{0}}^{0} W_{j : n_{0}}^{0} K (X_{0 i}, X_{0 j}) + i = 1 \sum n_{1} j = 1 i \sum n_{1} W_{i : n_{1}}^{1} W_{j : n_{1}}^{1} K (X_{1 i}, X_{1 j})

- 2 i = 1 \sum n_{0} j = 1 \sum n_{1} W_{i : n_{0}}^{0} W_{i : n_{1}}^{1} K (X_{0 i}, X_{1 j}) .

W_{i : n_{0}}^{0} = \frac{δ _{0 (i : n_{0})}}{n _{0} - i + 1} j = 1 \prod i - 1 [\frac{n _{0} - j}{n _{0} - j + 1}]^{δ_{0 (i : n_{0})}} (i = 1, \dots, n_{0})

W_{i : n_{0}}^{0} = \frac{δ _{0 (i : n_{0})}}{n _{0} - i + 1} j = 1 \prod i - 1 [\frac{n _{0} - j}{n _{0} - j + 1}]^{δ_{0 (i : n_{0})}} (i = 1, \dots, n_{0})

W_{i : n_{1}}^{1} = \frac{δ _{1 (i : n_{1})}}{n _{1} - i + 1} j = 1 \prod i - 1 [\frac{n _{1} - j}{n _{1} - j + 1}]^{δ_{1 (i : n_{1})}} (i = 1, \dots, n_{1}),

W_{i : n_{1}}^{1} = \frac{δ _{1 (i : n_{1})}}{n _{1} - i + 1} j = 1 \prod i - 1 [\frac{n _{1} - j}{n _{1} - j + 1}]^{δ_{1 (i : n_{1})}} (i = 1, \dots, n_{1}),

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty ϵ_{c (α)} (P_{0}, P_{1}) = 2 \int_{0}^{τ_{0}} \int_{0}^{τ_{1}} ∣∣ x - y ∣ ∣^{α} d P_{0}^{'} (x) d P_{1}^{'} (y)

\overset{ϵ}{^}_{α} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty ϵ_{c (α)} (P_{0}, P_{1}) = 2 \int_{0}^{τ_{0}} \int_{0}^{τ_{1}} ∣∣ x - y ∣ ∣^{α} d P_{0}^{'} (x) d P_{1}^{'} (y)

- \int_{0}^{τ_{0}} \int_{0}^{τ_{0}} ∣∣ x - y ∣ ∣^{α} d P_{0}^{'} (x) d P_{0}^{'} (y) - \int_{0}^{τ_{1}} \int_{0}^{τ_{1}} ∣∣ x - y ∣ ∣^{α} d P_{1}^{'} (x) d P_{1}^{'} (y),

\overset{γ}{^}_{K} (P_{0}, P_{1}) \to n_{0}, n_{1} \to \infty γ_{c (K)} (P_{0}, P_{1}) = \int_{0}^{τ_{0}} \int_{0}^{τ_{0}} K (x, y) d P_{0}^{'} (x) d P_{0}^{'} (y)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference

Full text

Energy distance and kernel mean embedding for two sample survival test

Marcos Matabuena

[email protected]

Centro de Investigación en Tecnoloxías da Información (CiTIUS),

Universidade de Santiago de Compostela, Santiago de Compostela. Spain

1 Abstract

In this article a new family of tests is proposed for the comparison problem of the equality of distribution of two-sample under right censoring scheme. The tests are based on energy distance and kernels mean embedding, are calibrated by permutations and are consistent against all alternatives. The good performance of the new tests in real situations with finite samples is established with a simulation study.

:

2 Introduction

One of the main objectives of the survival analysis is to compare the distribution of the lifetime of two-sample coming of two different groups. The most popular example of this situation is the case of clinical trials when evaluating the efficacy of two treatments [Singh and Mukhopadhyay, 2011]. Under a context of right censored data, the test most used within the scientific community to contrast the equality between two distribution curves is the logrank-test [Schoenfeld, 1981, Yang and Prentice, 2010, Su and Zhu, 2018] proposed at $1959$ by Mantel and Haenzel [Mantel and Haenszel, 1959]. This test is known to be the most powerful test when the hazard functions are proportional to each other [Schoenfeld, 1981, Su and Zhu, 2018, Xu et al., 2017]. However, when this hypothesis is violated the test has a significant loss of power [Fleming et al., 1980, Lachin and Foulkes, 1986, Lakatos, 1988, Schoenfeld, 1981].

Currently a hot topic in the medical field is for the hypothesis test to use [Su and Zhu, 2018], due to the lack of statistical power of the log-rank test found in many real case studies [Su and Zhu, 2018]. This is the case of the new oncological treatments where, for example, with new immunotherapy therapies, they have a delayed effect [Melero et al., 2014, Xu et al., 2017, Xu et al., 2018, Su and Zhu, 2018]. Also in the multimodal treatments [Moehler et al., 2007] where it is expected that the density function in many occasions present several mode, or in cases where healing occurs [López-Cheda et al., 2017]. In any of these situations, the hypothesis of proportional risks is strongly unfulfilled.

From a mathematical statistics point of view it is well-known that any test with finite samples has a poor behavior except in a finite number of directions. This means that in real scenarios we have no guarantee that one test will always be better than another. Precisely Janssen [Janssen, 2000] proved that you can not expect to build a test with a high power, except in a space of finite dimension. However, this does not mean that you can not build tests with an acceptable power for a large number of alternatives and in situations of interest, the objective sought by the statistical community in recent decades.

In the literature there are two types of different tests: the directionals and the omnibus. The former seek maximum power in specific directions, while the latter are consistent against all alternatives. The most popular family of directional tests with right censoring is that of the logrank-test [Fleming et al., 1987], to which the statistic of the logrank test is assigned a weight function that determines the optimality in certain directions [Gehan, 1965, Tarone and Ware, 1977, Peto and Peto, 1972, Fleming and Harrington, 1981]. On other occasions, within these tests the results of the individual tests are even combined to construct a global test [Bathke et al., 2009], or the function of weights [Yang and Prentice, 2010] is estimated, but this needs a significant amount of data. The Kolmogorov-Smirnov [Fleming et al., 1980] test and the Cramer-von Mises with censorship on the right [Schumacher, 1984] are two examples of omnibus tests.

The energy distance [Székely, 2003, Székely and Rizzo, 2013] is a statistical distance that measures how many different two probability distributions are. It is based on the calculation of Euclidean distances between pairs of variables and the notion of potential energy, and it has been used among other problems to compare the equal distribution in problems with several samples [Székely and Rizzo, 2004], goodness of fit [Székely and Rizzo, 2005], and cluster analysis [Szekely and Rizzo, 2005]. The main characteristic of this statistic is that it requires minimum hypotheses for its use, only conditions on the moments of the random variables involved. Its multivariate extension is immediate, and the test for the comparison of equality in distribution in problems with several samples presents a high statistical power with known distributions, even in high-dimensional contexts [Székely and Rizzo, 2004], being consistent for all alternatives. The generalization of the test with other types of metrics than the Euclidean ones like the negative type [Lyons et al., 2013, Rachev et al., 2013] is equivalent to the methods kernel [Sejdinovic et al., 2013, Shen and Vogelstein, 2018] proposed in [Gretton et al., 2012] and based on the kernel mean embedding [Muandet et al., 2017].

The main objective of this paper is to extend these tests to a context of right censored data in the univariate case. The structure of the paper is as follows: first we review the main literature of the methods of comparison of equality in the distribution of two-sample with right censoring, then we explain the relationship between energy distance and kernels mean embedding. The statistics are then derived and their theoretical properties of the test are established as the consistency against all alternatives. Finally a simulation study is carried out to compare the behavior of the proposed new methods against the classical tests of the literature. To do this, we will compare the power and error type I using known distributions, in addition to the cases discussed above, with delay, recovery or multimodality, where the log-rank test have less than ideal performance.

3 Previous research

Henceforth, let us consider the traditional framework in the problems of two-sample survival comparison given by the lifetimes $T_{ji}\mathbin{\overset{}{\kern 0.0pt\sim}}P_{j}$ $(j=0,1;i=1,\dots,n_{j})$ and censoring times $C_{ji}\mathbin{\overset{}{\kern 0.0pt\sim}}$ $Q_{j}$ $(j=0,1;i=1,\dots,n_{j})$ with distributions $P_{j}$ y $Q_{j}$ $(j=0,1)$ defined in an subset of $\mathbb{R^{+}}$ . As usual, the random variables $T_{01},\dots,T_{0n_{0}},\dots,T_{11},\dots,T_{1n_{1}},$ $C_{01},\dots,C_{0n_{0}},\dots,C_{11},\dots,C_{1n_{1}}$ are assumed to be independent of each other. In practice only the random variables are observed $X_{ji}=min(T_{ji},C_{j,i})$ and $\delta_{ji}=1\{X_{ji}=T_{ji}\}$ $(j=0,1;i=1,\dots,n_{j})$ . We will always assume $E(T_{ji}^{2})<\infty$ and $E(C_{ji}^{2})<\infty$ , and that the variables $X_{ji},T_{ji},C_{j,i}$ $(j=0,1;i=1,\dots,n_{j})$ are continuous for simplification.

The problem of two-sample that we will study is the following:

[TABLE]

At the maximum times observed for each group we will call them $\tau_{0}$ and $\tau_{1}$ respectively, and at the minimum of both, $\tau=\min(\tau_{0},\tau_{1})$ .

Next, we will describe the previous main literature on directional and omnibus tests.

3.1 Directional tests: The log-rank test family

In this subsection we will describe the logrank test and its different variants.

The times of failure will be denoted as $\tau_{1}<\tau_{2}\cdots<\tau_{k}$ . We define:

$Y_{i}(\tau_{j})$ = $\#$ people in the group $i$ who are at risk in $\tau_{j}$ $(i=0,1;j=1,2\dots,k)$ .

$Y(\tau_{j})=Y_{0}(\tau_{j})+Y_{1}(\tau_{j})$ = $\#$ people at risk in $\tau_{j}$ (in both groups).

$d_{ij}=$ $\#$ people who fail in the group $i$ in $\tau_{j}$ $(i=0,1;j=1,2\dots,k)$ .

$d_{j}=d_{0j}+d_{1j}=$ $\#$ people who fail in $\tau_{j}$ .

The statistic has the following structure:

[TABLE]

where $\omega_{j}$ $(j=1,2,\dots,k)$ is a weighting function that determine the properties of the test, and that depends on the number of people at risk in time $j$ , $Y(\tau_{j})$ , of the survival function estimated in time $j$ $\hat{S}(\tau_{j})$ , or in the last instant $\hat{S}(\tau_{j-1})$ .

Under the null hypothesis $H_{0}:P_{0}(\cdot)=P_{1}(\cdot)$ , $d_{1j}\approx H(Y(\tau_{j}),Y_{1}(\tau_{j}),d_{j})$ , where $H$ denotes the hypergeometric distribution and therefore, it is fulfilled, $E(d_{1j})=\frac{d_{1j}}{Y_{j}}Y_{1j}$ y $Var(d_{1j}-E(d_{1j}))=\frac{d_{j}(Y_{1j}/Y_{j})(1-Y_{1j}/Y_{j})(Y_{j}-d_{j})}{Y_{j}-1}$ ·

The main characteristics of the log-rank test and its variants will be described below:

•

Log-rank [Mantel and Haenszel, 1959].

The logrank test is optimal when the hazard function of the two groups are proportional. It results from taking $\omega_{j}=1$ $(j=1,2,\dots,k)$ . Under the null hypothesis is fulfilled:

[TABLE]

•

Gehan Generalized Wilcoxon Test [Gehan, 1965]

It is a test of free distribution that is an extension of the Wilconxon test in a context of right-censored. It provides much more weight to the early survival times. For this, it is taken as a function of weights $\omega_{j}=Y(\tau_{j})$ $(j=1,2,\dots,k)$ .

•

Tarone-Ware [Tarone and Ware, 1977]

It is a modification of the Gehan test, whose weight function is $\omega_{j}=\sqrt{(Y(\tau_{j})}$ $(j=1,2,\dots,k)$ , which assigns lower weights than in the Gehan test.

•

Peto-Peto [Peto and Peto, 1972]

The Peto test is used when the hazard function is not proportional, and the Kaplan-Meier estimator is used in the weight function $\omega_{j}=\hat{S}(t_{j})$ $(j=1,2,\dots,k)$ . The initial times receive more weighting than the more distant observations.

•

Fleming $\&$ Harrington family $G^{\rho,\gamma}$ [Fleming and Harrington, 1981]

In the test family Fleming $\&$ Harrington $G^{\rho,\gamma}$ the function of weights $\omega_{j}=\hat{S}(t_{j-1})^{\rho}(1-\hat{S}(t_{k-1}))^{\gamma}$ $(j=1,2,\dots,k)$ depends on two parameters $\rho\geq 0$ y $\gamma\geq 0$ that give the test much flexibility. The choice as plug-in of the Kaplan-Meier estimator increases the power of the test [Buyske et al., 2000].

3.2 The omnibus tests

The Kolmogorov Smirnov [Fleming et al., 1980, Schumacher, 1984] and Cramér-von Mises tests [Schumacher, 1984] under right-censored data are the most popular omnibus test. There are several versions of these two tests but some have certain limitations. For example, the direct extension of the Cramér-von Mises test to the censored case, the limit distribution [Koziol, 1978] of the Cramér-von Mises in general can not be calculated. In this subsection we will explain two versions of both tests proposed in [Schumacher, 1984] and based on the comparison of cumulative empirical hazard function.

Suppose $X_{ji}=min(T_{ji},C_{j,i})$ and $\delta_{ji}=1\{X_{ji}=T_{ji}\}$ $(j=0,1;i=1,\dots,n_{j})$ under the conditions of independence assumed in the section $2$ on the variables $T_{ji},C_{j,i}$ $(j=0,1;i=1,\dots,n_{j})$ .

To the ordered sample we will call them $X_{0(1:n_{0})}$ , $X_{0(2:n_{0})},\dots,X_{0(n_{0}:n_{0})}$ , $X_{1(1:n_{1})}$ , $X_{1(2:n_{1})},\dots X_{1(n_{1}:n_{1})}$ and we will also refer to the corresponding censorship with respect to induced ordering for observed times $\delta_{0(1:n_{0})}$ , $\delta_{0(2:n_{0})},\dots\delta_{0(n_{0}:n_{0})}$ , $\delta_{1(1:n_{1})}$ , $\delta_{1(2:n_{1})},\dots\delta_{1(n_{1}:n_{1})}$ .

Denoting by $S_{0}(t)$ , $S_{1}(t)$ to the survival functions of the groups [math] and $1$ respectively at the time instant $t$ and $\Lambda_{0}(t)$ , $\Lambda_{1}(t)$ to its cumulative hazard function, and considering the function

[TABLE]

The comparison problem $(1)$ can be expressed as:

[TABLE]

The function $\epsilon(t)$ can be estimated by:

[TABLE]

where, $\hat{\Lambda}_{j}(t)=\sum_{i:\hskip 2.84544pt\tau_{i}\leq t}\frac{d_{ji}}{Y_{j}(\tau_{i})}$ $(j=0,1)$ , denotes the estimator of Nelson-Aalen

[Nelson, 1972] of each group.

We define:

$Y_{j}(t)=\sum_{i=1}^{n_{j}}1\{X_{j(i:n_{j})}\geq t\}\hskip 5.69046pt(j=1,2),$

$\hat{A_{j}}(t)=n_{j}\sum_{i:\hskip 2.84544ptX_{j(i:n_{j})\leq t}}\frac{\delta_{j:(i:n_{j})}}{Y_{j}(X_{j(i:n_{j})})[Y_{j(i:n_{j})}+1]}\hskip 5.69046pt(j=1,2),$

$\hat{A}(t)=\frac{n_{0}+n_{1}}{n_{0}}\hat{A}_{0}(t)+\frac{n_{0}+n_{1}}{n_{1}}\hat{A}_{1}(t),$

$\hat{H}(t)=\hat{A}(t)/(1+\hat{A}(t)),$

$\hat{\psi}_{\epsilon}(t)=1/(\hat{A}(\tau))^{\frac{1}{2}},$

$\hat{\psi}^{0}_{\epsilon}(t)=1/(1+\hat{A}(t))$ .

From the previous expressions we can write the following two statistics of the Kolmogorov test

[TABLE]

and also for the Cramér?von Mises test:

[TABLE]

All statistics are consistent against all alternatives, and convergence almost surely to their analogous populations. The limit distribution of Kolmogorov Smirnov tests is in $Q_{KS\epsilon}$ the next Gaussian process $W(A(t)/A(\tau))$ where $W(x)$ denotes a Brownian standard movement and $Q^{0}_{KS\epsilon}$ converge to $W^{0}(H(t))$ where $W^{0}(x)$ is a Brownian bridge. While $Q_{CM\epsilon}$ converge to $A(\tau)^{2}\int_{0}^{\tau}[W(A(t))^{2}]dA(t))$ and $Q^{0}_{CM\epsilon}$ to $A(\tau)^{2}\int_{0}^{\tau}[W^{0}(H(t))^{2}]dH(t))$ .

For more details consult the following reference [Schumacher, 1984].

4 The energy distance and the kernels mean embedding

In this section we will introduce the energy distance, the RKHS (reproducing kernel Hilbert space) and its relation with the kernels mean embeddings. The explanation will be first at the population level and then at the sample level.

Given the random variables in $\mathbb{R}^{d}$ $X$ , $X^{\prime}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P$ and $Y$ , $Y^{\prime}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Q$ , with finite moments of order one $E(||X||_{d})\leq\infty$ , $E(||X^{\prime}||_{d})\leq\infty$ , $E(||Y||_{d})\leq\infty$ , $E(||Y^{\prime}||_{d})\leq\infty$ , and where, $P$ y $Q$ denotes its distribution functions. The energy distance [Székely, 2003, Székely and Rizzo, 2013] between the distributions $P$ and $Q$ is defined by:

[TABLE]

where $||\cdot||$ denotes the Euclidean norm.

It can be proved that $\epsilon(P,Q)$ it is invariant before rotations, in addition, it is non-negative $\epsilon(P,Q)\geq 0$ , giving equality to zero, if and only, $P=Q$ .

The previous definition of energy distance can be extended for a family of indices $\alpha\in(0,2]$ [Székely and Rizzo, 2013] (assuming in each case the existence of the moment of order $\alpha$ ). In this case, the $\alpha$ energy distance is:

[TABLE]

verifying, for all $\alpha\in(0,2)$ $\epsilon_{\alpha}(P,Q)\geq 0$ , and giving equality to zero, if and only, $P=Q$ . In the particular case with $\alpha=2$ , $\epsilon_{2}(P,Q)=2||E(X)-E(Y)||^{2}$ , and therefore, non-negativity is verified trivially, although in this situation, $\epsilon_{2}(P,Q)=0$ , implies equality in means and not in distribution between $P$ and $Q$ .

The notion of energy distance can be generalized to even more general spaces. Let $X,Y\in V$ where $V$ is an arbitrary space with a scalar product induced by a semi-metric of negative type [Rachev et al., 2013, Lyons et al., 2013] $\rho:V\times V\to\mathbb{R}$ , what is required to satisfy:

[TABLE]

where $\forall X_{i},X_{j}\in V$ , and each $c_{i}\in\mathbb{R}$ such that $\sum_{i=1}^{n}c_{i}=0$ . In this case, the pair $(V,\rho)$ it is said to be a negative type space [Lyons et al., 2013, Rachev et al., 2013]. Replacing $\mathbb{R}^{d}$ by $V$ and $||X-Y||$ by $\rho(X,Y)$ , in expression $(6)$ , we obtain the generalized energy distance for the negative type space $(V,\rho)$ :

[TABLE]

In any negative type space $(V,\rho)$ there is a hilbert space $H$ and an application $\phi:V\to H$ such that $\rho(X,Y)=||\phi(X)-\phi(Y)||_{H}^{2}$ [Rachev et al., 2013, Sejdinovic et al., 2013]. The previous relationship allows calculating the amounts of the distributions on $V$ in the associated Hilbert space $H$ . In the case $\rho$ does does not satistate the triangular inequality, the function $\rho^{1/2}$ the function verifies the distance axioms.

There is an equivalence [Székely and Rizzo, 2013, Shen and Vogelstein, 2018] between energy distance, commonly used in statistics [Székely and Rizzo, 2013], and the distance defined in the kernels mean embeddings [Gretton et al., 2012], the approach used mostly in the field of machine learning [Gretton et al., 2012]. Before explaining, we are going to introduce some basic concepts of the RKHS. For more information about the RKHS consult the following basic reference [Manton et al., 2015].

Let $H$ be the Hilbert space that contains the real variable functions defined above $V$ . A function $K:V\times V\to\mathbb{R}$ is a reproducing kernel in $H$ if it satisfies the following two properties:

$K(\cdot,x)\in H$ 2. 2.

$<K(\cdot,x),f>=f(x)$ $\forall x\in V$ and $f\in H$ .

The two properties above imply that $K$ is a positive definite and symmetric function. The theorem of Moore-Aronszajn [Aronszajn, 1950, Manton et al., 2015] establishes the converse equivalence, if $K:V\times V\to\mathbb{R}$ is a symmetric function and positive definite, there is a single reproducing kernel Hilbert space $H_{K}$ , which has as its reproducing kernel $K$ . The application $\phi:x\to K(\cdot,x)\in H_{K}$ is the so-called canonical feature application. Given a $K$ kernel, this theorem provides a method of how to define an embedding of a probability measure $P$ in an RKHS space. To do this, just consider the application $P\to h_{P}\in H_{K}$ such that $\int f(x)dP(x)=<f,h_{P}>$ $\forall f\in H_{K}$ , or equivalently, define $h_{P}=\int K(\cdot,x)dP(x)$ .

The notion of distance between two probabilities can be introduced using the inner product of $H_{K}$ , which, is called measure of maximum discrepancy (MMD) [Gretton et al., 2012] and is given by:

[TABLE]

The above expression [Gretton et al., 2012] can also be written as :

[TABLE]

where $X$ , $X^{\prime}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P$ and $Y$ , $Y^{\prime}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Q$ .

The next important result shows that negative-type semimetrics and positive defined kernels are strongly connected [Van Den Berg et al., 1984]. Let $\rho:V\times V\to\mathbb{R}$ and $x_{0}\in V$ an arbitrarily fixed point. If it is defined:

[TABLE]

Then, it can be shown that $K$ is a positive defined kernel if and only $\rho$ is a semimetric of negative type. In this way, we have a family of kernels, one for each election of $K(\cdot,x_{0})$ . Conversely, if $\rho$ is semimetric of negative type and $K$ is a kernel in this family, then it is verified:

[TABLE]

Finally using the above equality, along with $(10)$ and $(11)$ can be established the relation between the distance in the kernels mean embedding and the distance of energy in a space of negative type $(V,p)$ [Sejdinovic et al., 2013]:

[TABLE]

In a sample context, two samples are available $\{X_{0i}\}_{i=1}^{n_{0}}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{0}$ , $\{X_{1i}\}_{i=1}^{n_{1}}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{1}$ and the unknown quantities $\epsilon(P_{0},P_{1})$ and $\gamma_{K}^{2}(P_{0},P_{1})$ must be estimated. To do this, the empirical distribution is used as a plug-in and the statistical $U$ and $V$ is used as estimator. That is:

[TABLE]

( $V$ statistic $\alpha$ energy distance),

[TABLE]

( $V$ statistic kernel method),

[TABLE]

( $U$ statistic $\alpha$ energy distance),

[TABLE]

( $U$ statistic kernel method).

where the kernel $K:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}$ it has to be characteristic [Sriperumbudur et al., 2011, Gretton et al., 2012, Muandet et al., 2017].

In the table $2$ we can see the most known kernels with the property of being characteristic.

In the statistical community we usually use the energy of data with a $V$ statistic, which is a biased estimator [Kowalski and Tu, 2008], but which is always greater than or equal to zero [Székely and Rizzo, 2013]. While in the community of machine learning it is obtained by the kernel method with $U$ statistics, unbiased estimator [Kowalski and Tu, 2008], with a lower computational cost, but which can take negative values [Gretton et al., 2012].

Assuming moments of at least $2$ order in the random variables $\{X_{0i}\}_{i=1}^{n_{0}}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{0}$ , $\{X_{1i}\}_{i=1}^{n_{1}}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{1}$ , the sample statistic converges almost surely to the population version:

[TABLE]

The limit distribution of these statistics is derived as a consequence of the central theorems for $U$ and $V$ statistics in the degenerate case [Korolyuk and Borovskich, 1994] and can be found in the original works [Gretton et al., 2012, Székely and Rizzo, 2004]. However, in practice, to calibrate the tests the boostrap/permutations methods are used [Gretton et al., 2012, Székely and Rizzo, 2004].

5 The proposed tests

In this section, the tests based on energy distance and kernel mean embedding will be extended to a context of right censoring. In this case, unlike the previous section, the statistics will be deducted first and then the theoretical properties will be derived.

5.1 The statistics

As before, let us suppose $X_{ji}=min(T_{ji},C_{j,i})$ and $\delta_{ji}=1\{X_{ji}=T_{ji}\}$ $(j=0,1;i=1,\dots,n_{j})$ under the conditions of independence and regularity assumed in the section $2$ on the variables $T_{ji},C_{j,i}$ $(j=0,1;i=1,\dots,n_{j})$ .

For each group we consider their orderly sample $X_{0(1:n_{0})}$ , $X_{0(2:n_{0})},\dots,X_{0(n_{0}:n_{0})}$ , $X_{1(1:n_{1})}$ , $X_{1(2:n_{1})},\dots,X_{1(n_{1}:n_{1})}$ and also for the corresponding censored indicators $\delta_{0(1:n_{0})}$ , $\delta_{0(2:n_{0})},\dots\delta_{0(n_{0}:n_{0})}$ , $\delta_{1(1:n_{1})}$ , $\delta_{1(2:n_{1})},\dots,\delta_{1(n_{1}:n_{1})}$ .

In a context of right censoring (under independence), the maximum non parametric likelihood estimator is the Kaplan-Meier [Kaplan and Meier, 1958] estimator instead of the empirical distribution. This estimator is consistent [Wang et al., 1987] and for all $t>0$ , converges asymptotically a normal distribution [Cai, 1998]. One of its main characteristics is its negative bias [Stute, 1994], which if it is a mechanism of censored is high it can become considerable. In [Stute, 1994] in fact, an exact expression is provided for the bias of the Kaplan-Meier integral $\int\phi d\hat{F}_{n}$ , where $\hat{F}_{n}$ denotes Kaplan-Meier estimator.

If we replace as plug-in, the empirical distribution by the Kaplan-Meier estimator in $(13)$ and $(14)$ , we obtain the $V$ statistic for right censored data:

[TABLE]

( $V$ statistic energy distance under right censored),

[TABLE]

( $V$ statistic kernel method under right censored).

where

[TABLE]

and

[TABLE]

are the Kaplan-Meier integral weights [Stute, 1995].

However, as the limit of each statistic has the following structure:

[TABLE]

where usually, $\tau_{0}$ , $\tau_{1}$ are less than the maximum support value of the random variables $P_{0}$ and $P_{1}$ due to censorship. $P^{\prime}_{0}$ and $P^{\prime}_{1}$ take values in this domain with the same value that initial distribution $P_{0}$ and $P_{1}$ , but in general they not are distribution functions in the previous domain of integration.

As a consequence, there is no guarantee that the limit functions $\gamma_{c(K)}(P_{0},P_{1})$ and $\epsilon_{c(\alpha)}(P_{0},P_{1})$ are a function of distance between probability measures. Actually they are not, if $P^{0}$ is the distribution function of a random variables $N(100000,1)$ , $P^{1}$ of a $Uniform(0,1)$ and $\tau_{0}=\tau_{1}=0.1$ , then the value of $\epsilon_{c(\alpha)}(P_{0},P_{1})$ is negative. It is easy to verify that if $P_{0}=P_{1}$ and $\tau_{0}=\tau_{1}$ , then the limit is zero, but also we can build an example of two different probability measures with zero distance, so this statistics will not be consistent against all alternatives.

To solve this problem, we have to get $P^{\prime}_{0}(x)$ , $P^{\prime}_{1}(x)$ to be distribution functions in the previous integration domain, that is achieved by the previous functions, that is, $P^{\prime\prime}_{0}(x)=P^{\prime}_{0}(x)/\int_{0}^{\tau_{0}}dP^{\prime}_{0}(x)dx$ $\forall x\in[0,\tau_{0}]$ , and $P^{\prime\prime}_{1}(x)=P^{\prime}_{1}(x)/\int_{0}^{\tau_{1}}dP^{\prime}_{1}(x)dx$ $\forall x\in[0,\tau_{1}]$ . In addition, for the consistency of the test against all alternatives as we will see later we must impose that $\tau_{0}=\tau_{1}$ in the case that the support of the distribution functions $P_{0}$ and $P_{1}$ is not contained in the intervals $[0,\tau_{0}]$ and $[0,\tau_{1}]$ respectively.

This leads to consider the $U$ statistics under right censored suggested in [Bose and Sen, 1999] and apply the aforementioned standardization for multisample $U$ statistic under right censoring [Stute and Wang, 1993]. The corresponding statistics are the following:

[TABLE]

( $U$ statistic energy distance under right censoring),

[TABLE]

( $U$ statistic kernel method under right censoring).

Finally, we will use the following statistics

[TABLE]

to derive more easily, the consistency against all alternatives.

5.2 Permutation tests

As in the case without censorship, the null distribution of the statistics is calculate with permutation methods. If the censorship mechanism of the two groups is the same, the standard permutation methods are valid [Neuhaus et al., 1993, Wang et al., 2010]. However, when the censoring distributions differ, standard permutation methods do not work well for small-sample settings and/or when the amount of censoring is large [Heimann and Neuhaus, 1998]. In this case, we must use the resampling strategy proposed in [Wang et al., 2010].

We denote by $Z=(\overbrace{0,\cdots,0}^{n_{0}},\overbrace{1,\cdots,1}^{n_{1}})$ a vector the size $n$ $(n=n_{0}+n_{1})$ that contains the group to which it belongs to each data, and by $U=(X_{01},\cdots,X_{0n_{0}},X_{11},\cdots,X_{1n_{1}})$ and $\delta=(\delta_{01},\cdots,\delta_{0n_{0}},\delta_{11},\cdots,\delta_{1n_{1}})$ to the vectors of the same length that contain the observed times and the censorship indicator of each time. Given a statistic $\theta(Z,U,\delta)$ , the first step of traditional permutations method consists in calculate the value of each statistics for each permutation $\theta(Z^{r},U,\delta)$ $(r=1,2,\dots,\binom{n}{n_{0}})$ . Resulting each permutation of consider $\binom{n}{n_{0}}$ combination over the index $\{1,\dots,n\}$ in the following way: the values of $\binom{n}{n_{0}})$ different possible combinations are distributed to the first group and assigned the $n-n_{0}$ remaining index to the other group. Finally, we compare if $\theta(Z,U,\delta)$ is less or equal that $\theta(Z^{r},U,\delta)$ $(r=1,2,\dots,\binom{n}{n_{0}})$ . The p-value is calculated as follow:

[TABLE]

In practice, only a a small number of permutations is considered in the approximation of the latest expression.

5.3 Theoretical properties

5.3.1 Asymptotic distribution

The theoretical results derived for the asymptotic convergence in distribution under null hypothesis of the statistics will be established only in the proofs for the case of kernel mean embeddings. As we have seen before (equation $(11)$ ) given the equivalence between the tests based on the kermel mean embeddings and the energy distance [Sejdinovic et al., 2013] this is not restrictive.

We first transform each term in the previously sum by centering. Under the null hypothesis $P=P_{0}=P_{1}$ and $\tau_{0}=\tau_{1}$ , $P^{\prime\prime}=P_{0}^{\prime\prime}=P_{1}^{\prime\prime}$ and we have the same mean embedding

$\mu_{P^{\prime\prime}}=\mu_{P_{0}^{\prime\prime}}=\mu_{P_{1}^{\prime\prime}}=\frac{1}{P(\tau_{0})}\int_{0}^{\tau_{0}}K(\cdot,x)dP^{\prime\prime}(x)$ . Thus if we replace each instance of $K(X_{i},X_{j})$ with a kernel $K^{*}(X_{i},X_{j})$ which the mean has been subtracted,

[TABLE]

This gives the equivalent of the empirical $\hat{\gamma}_{K}^{2}(P_{0}^{\prime\prime},P_{1}^{\prime\prime})$

[TABLE]

Note that $K^{*}(\cdot,\cdot)$ is a degenerate kernel:

[TABLE]

Then, in the terms

[TABLE]

we can apply the limits theorems for $U$ statistics under right censored data [Bose and Sen, 2002, Fernández and Rivera, 2018]. In particular we will use the results [Fernández and Rivera, 2018] due to the weakest conditions to apply the theorems, and also, for the conditions that are assumed in this workit is proved in that same work that the theorems of asymptotic convergence are valid.

By the Corollary $2.9$ [Fernández and Rivera, 2018], under the null hyphotesis and $\tau_{0}=\tau_{1}$ we have:

[TABLE]

and

[TABLE]

where $\psi=\sum_{i=1}^{\infty}\lambda_{i}(\epsilon_{i}^{2}-1)$ , with $\epsilon_{i}$ $iid$ standard normal random variables and $c_{1}$ , $c_{2}$ are two constant specified in [Fernández and Rivera, 2018] that for our purpose are not irrelevant.

The structure of the previous limits coincides with the case without censoring in the degenerate case corresponds to $c+\psi$ [Korolyuk and Borovskich, 1994] where $c$ is a constant.

However, for the term

[TABLE]

which is a U-statistic of two samples under right censored data there are still no theoretical results.

The deduction of the theorems limits with $U$ statistics in several samples extends the objectives of this work, and will be presented in another paper. In any case, the limit distribution coincides with the case with censorship. This is

[TABLE]

where $\{\tau_{j}\}$ and $\{\epsilon_{j}\}$ are two independence sequences of standart normal random variables.

5.3.2 Consistency against all alternatives

Theorem 1.

Let $S,A$ be an arbitrary metrics spaces with the same topology defined on $\mathbb{R^{+}}$ with $S$ contained on $A$ and let $\gamma(x,y)$ be a continuous, symmetric, real function on $A\times A$ . Suppose $X$ , $X^{\prime}$ , $Y$ , $Y^{\prime}$ are independent $A$ random variables, $X$ , $X^{\prime}$ and identically distributed, and $Y$ , $Y^{\prime}$ are identically distributed. Suppose $\gamma(X,X^{\prime})$ , $\gamma(Y,Y^{\prime})$ , and $\gamma(X,Y)$ have finite expected values on $A$ . Then

[TABLE]

if and only if $\phi$ is negative definite and where $P$ and $Q$ denote the distribution of $X$ and $Y$ respectively. If $\gamma$ is strictly negative then equality holds if and only if $X$ and $Y$ are identically distributed on $S$ .

Proof.

By Theorem $1$ [Székely and Rizzo, 2005], it is verified:

[TABLE]

if and only if $\phi$ is negative definite. If $\gamma$ is strictly negative then equality holds if and only if $X$ and $Y$ are identically distributed on $A$ .

If we define the following random variables on $S$ , $X^{*}$ , $Y^{*}$ with distribution function $P^{\prime}$ , $Q^{\prime}$ respectively as follow :

$dP^{\prime}(x)=c_{1}dP(x)$ and $dQ^{\prime}(x)=c_{2}dP(x)$ , where $c_{1}=\frac{1}{\int_{S}dP(x)}$ and $c_{2}={\frac{1}{\int_{S}dQ(x)}}$ , And we consider his copies $X^{*\prime}$ , $Y^{*\prime}$ . As $\gamma(X,X^{\prime})$ , $\gamma(Y,Y^{\prime})$ , and $\gamma(X,Y)$ have finite expected values on $A$ , then $\gamma(X^{*},X^{*\prime})$ , $\gamma(Y^{*},Y^{*\prime})$ , and $\gamma(X^{*},Y^{*})$ have finite expected values on $S$ . Moreover, $\gamma(x,y)$ be a continuous, symmetric, real function on $S\times S$ .

This leads:

[TABLE]

if and only if $\phi$ is negative definite, and

[TABLE]

if $X^{*}$ and $Y^{*}$ are identically distributed on $A$ (with $\phi$ strictly negative) or equivalent $X$ and $Y$ are equally distributed on $S$ .

∎

Theorem 2.

Let $X_{ji}=min(T_{ji},C_{j,i})\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{c(j)}$ and $\delta_{ji}=1\{X_{ji}=T_{ji}\}$ $(j=0,1;i=1,\dots,n_{j})$ with $P_{c(j)}$ $(j=0,1)$ and under the conditions of assumed in the section $2$ on the variables $T_{ji}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{j},C_{j,i}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Q_{j}$ $(j=0,1;i=1,\dots,n_{j})$ . Then:

[TABLE]

where

$P_{0}^{\prime}(x)=\left\{\begin{array}[]{lcc}P_{0}(x)&if&x<\tau_{0}\\ \\ P_{0}(\tau_{0}^{-})+1\{\tau_{0}\in A^{1}\}P_{0}(\tau_{0})&if&x\geq\tau_{0}\\ \end{array}\right.$ **

and

$P_{1}^{\prime}(x)=\left\{\begin{array}[]{lcc}P_{1}(x)&if&x<\tau_{1}\\ \\ P_{1}(\tau_{1}^{-})+1\{\tau_{1}\in A^{1}\}P_{1}(\tau_{1})&if&x\geq\tau_{1}.\\ \end{array}\right.$ **

Here, $\tau_{0}=\inf\{x:1-P_{c(0)}(x)=0\}$ , $\tau_{1}=\inf\{x:1-P_{c(1)}(x)=0\}$ , $A^{0}=\{x\in\mathbb{R}|P_{c(0)}\{x\}>0\}$ and $A^{1}=\{x\in\mathbb{R}|P_{c(1)}\{x\}>0\}$ .

Proof.

The proof consists of repeatedly applying the strong laws of large numbers for $U$ Kaplan Meier statistics with two samples [Stute and Wang, 1993], with the convergence results for $U$ statistic of degree two for randomly censored [Bose and Sen, 1999].

By [Stute and Wang, 1993] we know that

[TABLE]

where $h$ is a given kernel of degree two such that

[TABLE]

Note that by hypothesis that $P_{c(j)}$ $(j=0,1)$ is continuous distribution function implies that $A^{0}$ and $A^{1}$ are empty set and therefore $P^{\prime}_{0}(x)=P_{0}(x)$ $\forall\in[0,\tau_{0}]$ and $P^{\prime}_{1}(x)=P_{1}(x)$ $\forall\in[0,\tau_{1}]$

Applying the previous result with $h(x,y)=1$ , along with the properties of convergence in probability, we have:

[TABLE]

Using the theorem $1$ of [Bose and Sen, 1999], it is verified also

[TABLE]

and

[TABLE]

.

Finally taking as $h(x,y)=||x-y||^{\alpha}$ or $h(x,y)=K(x,y)$ and applying the properties of convergence in probability of the sum of two random variables, the desired result is obtained.

∎

Theorem 3.

Let $X_{ji}=min(T_{ji},C_{j,i})\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{c(j)}$ and $\delta_{ji}=1\{X_{ji}=T_{ji}\}$ $(j=0,1;i=1,\dots,n_{j})$ with $P_{c(j)}$ $(j=0,1)$ under the conditions of independence assumed in the section $2$ on the variables $T_{ji}\mathbin{\overset{iid}{\kern 0.0pt\sim}}P_{j},C_{j,i}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Q_{j}$ $(j=0,1;i=1,\dots,n_{j})$ . Also let’s suppose that $\tau_{0}=\tau_{1}$ or the support of the distribution functions $P_{0}$ and $P_{1}$ is contained in the intervals $[0,\tau_{0}]$ and $[0,\tau_{1}]$ respectively. Then, the statistics $T_{\hat{\epsilon}_{\alpha}}$ $T_{\hat{\gamma}_{K}^{2}}$ determines a test of the hypothesis of equal distributions that is consistent against all fixed alternatives with continuos random variables.

Proof.

We assume without any restriction that $P_{0}$ and $P_{1}$ have the same support (otherwise it is enough to extend the probability measure with less support to the higher one). If $\tau_{0}=\tau_{1}$ we can apply theorem $1$ and then we have guaranteed:

[TABLE]

and giving the equality to zero if and only if $P_{0}(t)=P_{1}(t)$ $\forall t\in[0,t_{1}]$

Suppose $\exists t\in[0,\tau_{1}]$ $P_{0}(t)\neq P_{1}(t)$ , then we have strictly inequality in $(36,37)$ , so with probability one $\lim_{n_{0}\to\infty,n_{1}\to\infty}P(\hat{\epsilon}_{\alpha}(P_{0},P_{1})=c_{\epsilon_{\alpha}}>0)=1$ $\lim_{n_{0}\to\infty,n_{1}\to\infty}P(\hat{\gamma}_{K}(P_{0},P_{1})=c_{K}>0)=1$ . By the theory of degenerate $U$ -statistics under the null hyphotesis there exists a constants $c_{\alpha_{1}}$ and $c_{\alpha_{2}}$ satisfying

[TABLE]

Under the alternative hypothesis

[TABLE]

since $n\hat{\epsilon}_{\alpha}(P_{0},P_{1})\to\infty$ and $n\hat{\gamma}_{K}(P_{0},P_{1})$ with probabiliy one as $n\to\infty$ .

In the case $\tau_{0}\neq\tau_{1}$ the support of the distribution functions $P_{0}$ and $P_{1}$ is contained in the intervals $[0,\tau_{0}]$ and $[0,\tau_{1}]$ and in this situation the normalization constants are $1$ , and then, the previous argument is going to be true.

∎

6 Simulation study

The simulation study is divided into two phases. In the first, the performance of the new tests proposed under the null hypothesis is compared with the logrank family tests with different censorship rates and different sample size. In particular, the tests used are the energy distance (with $\alpha\in\{0.4,0.8,1,1,2,1.6\}$ ), gaussian kernel $(\sigma=1$ ), laplacian kernel $(\sigma=1$ ), rational quadratic ( $c=\beta=1$ and $c=\beta=2$ ), log-rank, Gehan generalized Wilcoxon test, Tarone-Ware, Peto-Peto, Fleming $\&$ Harrington (with $\rho=\gamma=1$ ). For this purpose, parametric distributions such as normal, exponential or lognormal are used. In the second phase, the same tests are compared where the null hypothesis is not true, in different scenarios: proportional hazard ratio, cure, multimodality, and delayed effect.We use different censorship mechanisms for each case and we vary the sample size $n$ ( $n\in\{20,50,100\}$ ).

All the tests are executed on the statistical software R. For the family of the logrank test the coin package [Hothorn et al., 2008] is used, while the new tests have been implemented in C++, and integrating them in R with the Rcpp [Eddelbuettel et al., 2011], and Rcpp Armadillo libraries. In both cases the tests are calibrated by the permutations method, performing $1000$ repetitions for our tests.

6.1 Null hyphotesis

We simulate $500$ times two samples in which the null hypothesis is correct. The censoring rates are $10$ and $30$ percent, and the sample size of $20$ and $50$ individuals. As under the null hypothesis

p-value $\mathbin{\overset{}{\kern 0.0pt\sim}}\text{Uniform}(0,1)$ , the mean of the p-values obtained should be close to $0.5$ , and the Standard deviation $\sqrt{1/12}=0.2886751$ . Likewise, approximately the $5$ percent of the observations should have a value less than $0.05$ . In Table 3 we can see the results of calculating the mean and standard deviation for each test and case study contemplated, while in Table 4 shows the proportion of p values that are less or equal than $0.05$ in the same cases.

[FIGURE:]

The results shown of the new tests proposed under the null hypothesis are consistent and similar to those of the logrank test family. Note that it is normal that there are certain discrepancies with the theoretical values when doing the comparison with $500$ repetitions, in $14$ different tests. In turn, the Kaplan-Meier estimator used in our models and in some of the logrank family presents a certain bias (dependent on the censoring ratio), which produces small deviations under what is expected in a theoretical framework under the null hypothesis.

6.2 Alternative hyphotesis

As before, we simulate $500$ repetitions of two samples, but this time the null hypothesis is unfulfilled. The cases we studied are the following: the hazard ratio is proportional between two populations (the logrank test is the most powerful test in this context), healing occurs in a one population, in a population the density function has several modes as a consequence of a multimodal treatment, there is delayed effects in a population. The sample size vary by $20$ , $50$ and $100$ people in each group and the censoring mechanics change between experiments. The significance level $\alpha$ of $0.05$ is used as the cutoff for significance.

In each figure for each subcase we represent four graphs: In the first one, the power of the tests of the energy of data, in the second of the kernel methods, in the third of the logrank test together with the other family methods, and in the last, the logrank test, the average power of the energy of data tests, of the kernel methods, and of the family logrank test.

6.2.1 Proportional hazard ratio in two population

We simulate $500$ times varying the sample sizes with $20$ individuals from each group, $50$ and $100$ , in the following $10$ cases of study: $X\mathbin{\overset{}{\kern 0.0pt\sim}}Exp(1)$ versus $Y\mathbin{\overset{}{\kern 0.0pt\sim}}Exp(\theta)\text{ }(\text{with}\text{ }\theta\in\{1,1.1,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2\})$ .

We representate the results based on the variation of the parameter $\theta$ for each sample size $n$ in different figures. In figure 1 we show the results for $n=20$ , in figure 2, for $n=50$ , and finally in figure 3 for $n=100$ . As we can see in the three figures, the logrank test is usually the most powerful test, as is logical in the situation where this test is optimal from a theoretical point of view. However, the average of the results obtained by the distance of energy is not far in statistical power. We can also appreciate that the selection of the parameters of both the energy distance and the kernel methods leads to more or less power for this case study, which gives great flexibility to the family of tests.

6.2.2 Cure

We simulate data with the next predefined hazard ratio function for each population on $[0,100]$ :

[TABLE]

and

[TABLE]

The censoring times $C_{ij}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Uniform(0,10)$ $(j=0,1,i=1,\dots,n_{j})$ . In the figure 4 we can see the graphical representation of the survival function resulting from calculating the Kaplan-Meier estimator using $5000$ subjects of each group. Figure 5 collects the results of the power study, where it can be seen that in this case the most powerful tests are those given by energy distance and kernel methods. It is curious that in the tests of these two families there is hardly any variability between the tests studied, however this is not the case in the family logrank test where there are many differences between the different tests.

6.2.3 Multimodality

We simulate data also with default hazard ratio function for each population on $[0,100]$ :

[TABLE]

and

[TABLE]

The censoring times $C_{ij}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Uniform(0,10)$ $(j=0,1,i=1,\dots,n_{j})$ . The figure 6 show the Kaplan Meier estimator of each group. In this case of study, the family of the logrank test has more power (figure 7) than the chosen tests based on energy distance and the kernel method. In turn, there is much discrepancy in the power achieved in many tests of this family, with some of them like Fleming having less power than the new tests proposed.

6.2.4 Delayed effect

We consider the next hazard ratio functions for each population:

[TABLE]

and

[TABLE]

The censoring variables $C_{ij}\mathbin{\overset{iid}{\kern 0.0pt\sim}}Uniform(0,15)$ $(j=0,1,i=1,\dots,n_{j})$ . The Kaplan Meier estimator is shown in the figure 8.

In this last simulation, the methods based on kernel methods are the most powerful by far (figure 9). The power achieved by the log rank family tests and the energy distance is similar. However, the power of the log rank test is very low, with hardly any greater detection capacity than under the null hypothesis. In addition, this also occurs at the energy distance for $\alpha=1.2$ and $\alpha=1.6$ , which shows that the appropriate parameter selection is necessary for the correct use of these tests.

7 Final remarks

In this article a new statistics for testing the equality of survival distributions with censored data are proposed. The tests are consistent against all alternatives and with finite samples in situations of great clinical interest, such as the new oncological treatments where the new pharmacological strategies consist of introducing a delay effect [Melero et al., 2014, Xu et al., 2017] in the new drugs, greatly exceeding the performance of the classic tests if we select the correct parameters. In the other situations analyzed, the performance is higher as in the case of the study of healing, very close to the optimum when the hazard ratio is constant and slightly worse in the case of simulated multimodal treatments. In general, the performance is better than the classic tests, however there are certain issues such as the choice of optimal parameters or kernels in each situation that are still unresolved (It also happens in the uncensored case [Szekely and Rizzo, 2017]). In addition, in the analysis of survival when estimating the mean [Datta, 2005], it is common to consider the Efron correction [Efron, 1967] that consists of considering that the maximum time observed in each group is uncensored ( $\delta_{0(n_{0}:n_{0})}=\delta_{0(n_{1}:n_{1})}=1$ ), or resorting to other imputation techniques with censored observations, both for the estimation of the mean [Datta, 2005], or in the global estimation of the weights of the Kaplan Meier estimator such as presmoothed [Cao and Jácome, 2004]. In any case, this may increase the power of the tests, but also increase the bias.

The extension of the tests proposed with $k$ -samples is analogous to the case without censorship, in which there is a variety of literature such as Disco analysis [Rizzo et al., 2010], extension of the ANOVA test to testing the equality distribution in an uncensored context, or more recent the kernel methods proposed method in [Balogoun et al., 2018] .

Soon on my github at https://github.com/mmatabuena will appear a R package called energysurv with the proposed methods implemented in C++ in which the scientific community could use the new tests as a valuable alternative to classical survival tests.

Graphics

8 Acknowledgements*

This work has received financial support from the Consellería de Cultura, Educación e Ordenación Universitaría (accreditation 2016-2019, ED431G/08) and the European Regional Development Fund (ERDF).

Bibliography61

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Aronszajn, 1950] Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American mathematical society , 68(3):337–404.
2[Balogoun et al., 2018] Balogoun, A. S. K., Nkiet, G. M., and Ogouyandjou, C. (2018). Kernel based method for the k 𝑘 k -sample problem. ar Xiv preprint ar Xiv:1812.00100 .
3[Bathke et al., 2009] Bathke, A., Kim, M.-O., and Zhou, M. (2009). Combined multiple testing by censored empirical likelihood. Journal of Statistical Planning and Inference , 139(3):814–827.
4[Bose and Sen, 1999] Bose, A. and Sen, A. (1999). The strong law of large numbers for kaplan–meier u-statistics. Journal of Theoretical Probability , 12(1):181–200.
5[Bose and Sen, 2002] Bose, A. and Sen, A. (2002). Asymptotic distribution of the kaplan–meier u-statistics. Journal of multivariate analysis , 83(1):84–123.
6[Buyske et al., 2000] Buyske, S., Fagerstrom, R., and Ying, Z. (2000). A class of weighted log-rank tests for survival data when the event is rare. Journal of the American Statistical Association , 95(449):249–258.
7[Cai, 1998] Cai, Z. (1998). Asymptotic properties of kaplan-meier estimator for censored dependent data. Statistics & probability letters , 37(4):381–389.
8[Cao and Jácome, 2004] Cao, R. and Jácome, M. (2004). Presmoothed kernel density estimator for censored data. Nonparametric Statistics , 16(1-2):289–309.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Energy distance and kernel mean embedding for two sample survival test

1 Abstract

2 Introduction

3 Previous research

3.1 Directional tests: The log-rank test family

3.2 The omnibus tests

4 The energy distance and the kernels mean embedding

5 The proposed tests

5.1 The statistics

5.2 Permutation tests

5.3 Theoretical properties

5.3.1 Asymptotic distribution

5.3.2 Consistency against all alternatives

Theorem 1**.**

Proof.

Theorem 2**.**

Proof.

Theorem 3**.**

Proof.

6 Simulation study

6.1 Null hyphotesis

6.2 Alternative hyphotesis

6.2.1 Proportional hazard ratio in two population

6.2.2 Cure

6.2.3 Multimodality

6.2.4 Delayed effect

7 Final remarks

Graphics

8 Acknowledgements*

Theorem 1.

Theorem 2.

Theorem 3.