Efficient computation of higher order cumulant tensors

Krzysztof Domino; Piotr Gawron; {\L}ukasz Pawela

arXiv:1701.05420·cs.NA·November 10, 2020

Efficient computation of higher order cumulant tensors

Krzysztof Domino, Piotr Gawron, {\L}ukasz Pawela

PDF

3 Repos

TL;DR

This paper presents a new tensor-based algorithm for efficiently computing higher order cumulants of multidimensional data, significantly reducing computational complexity and memory usage compared to previous methods.

Contribution

It introduces a novel, super-symmetry exploiting algorithm for arbitrary order cumulant tensors, improving efficiency over existing approaches.

Findings

01

Reduces computational complexity by approximately d!

02

Decreases memory requirements for cumulant calculation

03

Applicable to high-dimensional, higher-order cumulant computation

Abstract

In this paper, we introduce a novel algorithm for calculating arbitrary order cumulants of multidimensional data. Since the $d^{th}$ order cumulant can be presented in the form of an $d$ -dimensional tensor, the algorithm is presented using tensor operations. The algorithm provided in the paper takes advantage of super-symmetry of cumulant and moment tensors. We show that the proposed algorithm considerably reduces the computational complexity and the computational memory requirement of cumulant calculation as compared with existing algorithms. For the sizes of interest, the reduction is of the order of $d!$ compared to the naive algorithm.

Figures11

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Symbols used in the paper.

Symbol	Description/explanation
$𝐢 = (i_{1}, \dots, i_{d})$	$d$ element multi-index
$\| 𝐢 \| = d$	size of multi-index (number of elements)
$π (𝐢)$	permutation of multi-index
$1 : d$	set of integers ${1, 2, \dots, d}$
$𝐗 \in ℝ^{t \times n}$	matrix of $t$ realisations of $n$ dimensional random variable
$X_{i} = {[x_{1, i}, \dots, x_{t, i}]}^{⊺}$	vector of $t$ realisations of the $i$ ^th marginal random variable
$E (X_{i_{1}}, \dots, X_{i_{d}}) = \frac{1}{t} \sum_{l = 1}^{t} \prod_{k = 1}^{d} x_{l, i_{k}}$	expectational value operator
$𝒜 \in ℝ^{[n, d]}$	super-symmetric $d$ mode tensor of size $n \times \dots \times n$ , with elements $a_{𝐢}$
$𝒜 \in ℝ^{n_{1} \times \dots \times n_{d}}$	$d$ mode tensor of sizes $n_{1} \times \dots \times n_{d}$ , with elements $a_{𝐢}$
$\tilde{𝐗} \in ℝ^{t \times n}$	matrix of $t$ realisations of $n$ dimensional centered random variable
$𝒞_{d} (𝐗) \in ℝ^{[n, d]}$	the $d$ ^th cumulant tensor of $𝐗$ with elements $c_{𝐢}$
${(𝒞_{d})}_{𝐣} \in ℝ^{[b, d]}$ , ${(ℳ_{d})}_{𝐣} \in ℝ^{[b, d]}$	block of the $d$ ^th cumulant or moment tensor indexed by $𝐣$ in the block structure.
$ℳ_{d} (\tilde{𝐗}) \in ℝ^{[n, d]}$	the $d$ ^th central moment tensor of $𝐗$ with elements $m_{𝐢}$
$M_{d} (X) \in ℝ$	the $d$ ^th moment of one dimensional $X \in ℝ^{t}$

Equations118

\tilde{ϕ} : R^{n} \to R K : R^{n} \to R \tilde{ϕ} (τ) = exp (τ^{⊺} μ + \frac{1}{2} τ^{⊺} Σ τ), K (τ) = lo g (\tilde{ϕ} (τ)) = τ^{⊺} μ + \frac{1}{2} τ^{⊺} Σ τ .

\tilde{ϕ} : R^{n} \to R K : R^{n} \to R \tilde{ϕ} (τ) = exp (τ^{⊺} μ + \frac{1}{2} τ^{⊺} Σ τ), K (τ) = lo g (\tilde{ϕ} (τ)) = τ^{⊺} μ + \frac{1}{2} τ^{⊺} Σ τ .

\mathbf{X}=\left[\begin{array}[]{ccc}x_{1,1}&\dots&x_{1,n}\\ \vdots&\ddots&\vdots\\ x_{t,1}&\dots&x_{t,n}\\ \end{array}\right].

\mathbf{X}=\left[\begin{array}[]{ccc}x_{1,1}&\dots&x_{1,n}\\ \vdots&\ddots&\vdots\\ x_{t,1}&\dots&x_{t,n}\\ \end{array}\right].

X = [X_{1}, \dots, X_{i}, \dots, X_{n}],

X = [X_{1}, \dots, X_{i}, \dots, X_{n}],

X_{i} = [x_{1, i}, \dots, x_{j, i}, \dots, x_{t, i}]^{⊺} .

X_{i} = [x_{1, i}, \dots, x_{j, i}, \dots, x_{t, i}]^{⊺} .

\forall_{π} a_{i} = a_{π (i)} .

\forall_{π} a_{i} = a_{π (i)} .

m_{i} = E (X_{i_{1}}, \dots, X_{i_{d}}) = \frac{1}{t} l = 1 \sum t k = 1 \prod d x_{l, i_{k}},

m_{i} = E (X_{i_{1}}, \dots, X_{i_{d}}) = \frac{1}{t} l = 1 \sum t k = 1 \prod d x_{l, i_{k}},

\tilde{X} = [\tilde{X_{1}}, \dots, \tilde{X}_{i}, \dots, \tilde{X}_{n}], with \tilde{X}_{i} = X_{i} - E (X_{i}) .

\tilde{X} = [\tilde{X_{1}}, \dots, \tilde{X}_{i}, \dots, \tilde{X}_{n}], with \tilde{X}_{i} = X_{i} - E (X_{i}) .

K (τ) = lo g (\frac{\sum _{j = 1}^{t} exp ( [ x _{j, 1} , \dots , x _{j, n} ] \cdot τ )}{t}),

K (τ) = lo g (\frac{\sum _{j = 1}^{t} exp ( [ x _{j, 1} , \dots , x _{j, n} ] \cdot τ )}{t}),

c_{i_{1},\ldots,i_{d}}=\frac{\partial^{d}}{\partial\tau_{i_{1}},\ldots,\partial\tau_{i_{d}}}\log\left(K(\tau)\right)\bigg{|}_{\tau=0},

c_{i_{1},\ldots,i_{d}}=\frac{\partial^{d}}{\partial\tau_{i_{1}},\ldots,\partial\tau_{i_{d}}}\log\left(K(\tau)\right)\bigg{|}_{\tau=0},

C_{1} (X) = [E (X_{1}), \dots, E (X_{n})] .

C_{1} (X) = [E (X_{1}), \dots, E (X_{n})] .

\mathcal{C}_{2}(\mathbf{X})=\left[\begin{array}[]{ccc}E\bigg{(}\tilde{X}_{1}\tilde{X}_{1}\bigg{)}&\dots&E\bigg{(}\tilde{X}_{1}\tilde{X}_{n}\bigg{)}\\ \vdots&\ddots&\vdots\\ E\bigg{(}\tilde{X}_{n}\tilde{X}_{1}\bigg{)}&\dots&E\bigg{(}\tilde{X}_{n}\tilde{X}_{n}\bigg{)}\\ \end{array}\right].

\mathcal{C}_{2}(\mathbf{X})=\left[\begin{array}[]{ccc}E\bigg{(}\tilde{X}_{1}\tilde{X}_{1}\bigg{)}&\dots&E\bigg{(}\tilde{X}_{1}\tilde{X}_{n}\bigg{)}\\ \vdots&\ddots&\vdots\\ E\bigg{(}\tilde{X}_{n}\tilde{X}_{1}\bigg{)}&\dots&E\bigg{(}\tilde{X}_{n}\tilde{X}_{n}\bigg{)}\\ \end{array}\right].

c_{i} (X) = E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}} \tilde{X}_{i_{3}}) .

c_{i} (X) = E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}} \tilde{X}_{i_{3}}) .

c_{i} (X) = E (X_{i_{1}} X_{i_{2}} X_{i_{3}} X_{i_{4}}) \times 4 - E (X_{i_{1}}) E (X_{i_{2}} X_{i_{3}} X_{i_{4}}) - E (X_{i_{2}}) E (X_{i_{1}} X_{i_{3}} X_{i_{4}}) - \dots - \times 3 E (X_{i_{1}} X_{i_{2}}) E (X_{i_{3}} X_{i_{4}}) - \dots + 2 \times 6 E (X_{i_{1}}) E (X_{i_{2}}) E (X_{i_{3}} X_{i_{4}}) + \dots - 6 E (X_{i_{1}}) E (X_{i_{2}}) E (X_{i_{3}}) E (X_{i_{4}}) .

c_{i} (X) = E (X_{i_{1}} X_{i_{2}} X_{i_{3}} X_{i_{4}}) \times 4 - E (X_{i_{1}}) E (X_{i_{2}} X_{i_{3}} X_{i_{4}}) - E (X_{i_{2}}) E (X_{i_{1}} X_{i_{3}} X_{i_{4}}) - \dots - \times 3 E (X_{i_{1}} X_{i_{2}}) E (X_{i_{3}} X_{i_{4}}) - \dots + 2 \times 6 E (X_{i_{1}}) E (X_{i_{2}}) E (X_{i_{3}} X_{i_{4}}) + \dots - 6 E (X_{i_{1}}) E (X_{i_{2}}) E (X_{i_{3}}) E (X_{i_{4}}) .

c_{i} (X) = E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}} \tilde{X}_{i_{3}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}}) E (\tilde{X}_{i_{3}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{3}}) E (\tilde{X}_{i_{2}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{4}}) E (\tilde{X}_{i_{2}} \tilde{X}_{i_{3}}) .

c_{i} (X) = E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}} \tilde{X}_{i_{3}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{2}}) E (\tilde{X}_{i_{3}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{3}}) E (\tilde{X}_{i_{2}} \tilde{X}_{i_{4}}) - E (\tilde{X}_{i_{1}} \tilde{X}_{i_{4}}) E (\tilde{X}_{i_{2}} \tilde{X}_{i_{3}}) .

\mathcal{C}_{2}=\left[\begin{array}[]{cccc}({\mathcal{C}_{2}})_{11}&({\mathcal{C}_{2}})_{12}&\cdots&({\mathcal{C}_{2}})_{1\bar{n}}\\ \text{NULL}&({\mathcal{C}_{2}})_{22}&\cdots&({\mathcal{C}_{2}})_{2\bar{n}}\\ \vdots&\vdots&\ddots&\vdots\\ \text{NULL}&\text{NULL}&\cdots&({\mathcal{C}_{2}})_{\bar{n}\bar{n}}\\ \end{array}\right],

\mathcal{C}_{2}=\left[\begin{array}[]{cccc}({\mathcal{C}_{2}})_{11}&({\mathcal{C}_{2}})_{12}&\cdots&({\mathcal{C}_{2}})_{1\bar{n}}\\ \text{NULL}&({\mathcal{C}_{2}})_{22}&\cdots&({\mathcal{C}_{2}})_{2\bar{n}}\\ \vdots&\vdots&\ddots&\vdots\\ \text{NULL}&\text{NULL}&\cdots&({\mathcal{C}_{2}})_{\bar{n}\bar{n}}\\ \end{array}\right],

b_{l} = (n - b (\overset{n}{ˉ} - 1)) .

b_{l} = (n - b (\overset{n}{ˉ} - 1)) .

b_{j_{p}}=\left\{\begin{array}[]{ll}b,&\textrm{if $j_{p}<\bar{n}$},\\ b_{l},&\textrm{if $j_{p}=\bar{n}.$}\end{array}\right.

b_{j_{p}}=\left\{\begin{array}[]{ll}b,&\textrm{if $j_{p}<\bar{n}$},\\ b_{l},&\textrm{if $j_{p}=\bar{n}.$}\end{array}\right.

M_{d} (X) = \frac{1}{p} s = 1 \sum p M_{d} (X_{s}) .

M_{d} (X) = \frac{1}{p} s = 1 \sum p M_{d} (X_{s}) .

m_{i} (X_{s}) = \frac{p}{t} l = (\frac{t}{p} - 1) s + 1 \sum \frac{t s}{p} k = 1 \prod ∣ i ∣ x_{l, i_{k}} .

m_{i} (X_{s}) = \frac{p}{t} l = (\frac{t}{p} - 1) s + 1 \sum \frac{t s}{p} k = 1 \prod ∣ i ∣ x_{l, i_{k}} .

m_{i} (X) = \frac{1}{t} s = 1 \sum p l = (\frac{t}{p} - 1) s + 1 \sum \frac{t s}{p} k = 1 \prod ∣ i ∣ x_{l, i_{k}} = \frac{1}{p} s = 1 \sum p m_{i} (X_{s}) .

m_{i} (X) = \frac{1}{t} s = 1 \sum p l = (\frac{t}{p} - 1) s + 1 \sum \frac{t s}{p} k = 1 \prod ∣ i ∣ x_{l, i_{k}} = \frac{1}{p} s = 1 \sum p m_{i} (X_{s}) .

r = 1 ⋃ σ k_{r} = k \land \forall_{r \neq = r^{'}} k_{r} \cap k_{r}^{'} = \emptyset.

r = 1 ⋃ σ k_{r} = k \land \forall_{r \neq = r^{'}} k_{r} \cap k_{r}^{'} = \emptyset.

P_{\sigma}(\mathbf{k})\sim P^{\prime}_{\sigma}(\mathbf{k})\Leftrightarrow\Big{(}\exists_{\pi^{\prime}}\ \forall_{r\in 1:\sigma}\ \exists_{\pi_{r}}:(\mathbf{k}_{1},\ldots,\mathbf{k}_{\sigma})=\pi^{\prime}\left(\pi_{1}(\mathbf{k}^{\prime}_{1}),\ldots,\pi_{\sigma}(\mathbf{k}^{\prime}_{\sigma})\right)\Big{)}.

P_{\sigma}(\mathbf{k})\sim P^{\prime}_{\sigma}(\mathbf{k})\Leftrightarrow\Big{(}\exists_{\pi^{\prime}}\ \forall_{r\in 1:\sigma}\ \exists_{\pi_{r}}:(\mathbf{k}_{1},\ldots,\mathbf{k}_{\sigma})=\pi^{\prime}\left(\pi_{1}(\mathbf{k}^{\prime}_{1}),\ldots,\pi_{\sigma}(\mathbf{k}^{\prime}_{\sigma})\right)\Big{)}.

# {[P_{σ} (k)]} = S (d, σ) = \frac{1}{σ !} j = 0 \sum σ (- 1)^{(σ - j)} (j σ) j^{d} .

# {[P_{σ} (k)]} = S (d, σ) = \frac{1}{σ !} j = 0 \sum σ (- 1)^{(σ - j)} (j σ) j^{d} .

a_{(i, i^{'})} = c_{i} c_{i^{'}},

a_{(i, i^{'})} = c_{i} c_{i^{'}},

a_{i} = ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ \prod c_{i_{k_{r}}} .

a_{i} = ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ \prod c_{i_{k_{r}}} .

A_{d} = ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})} .

A_{d} = ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})} .

A_{4} = ζ \in {[P_{2} (1 : 4)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})},

A_{4} = ζ \in {[P_{2} (1 : 4)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})},

a_{i_{1}, i_{2}, i_{3}, i_{4}} = c_{i_{1}, i_{2}} c_{i_{3}, i_{4}} + c_{i_{1}, i_{3}} c_{i_{2}, i_{4}} + c_{i_{1}, i_{4}} c_{i_{2}, i_{3}} = + c_{i_{1}} c_{i_{2}, i_{3}, i_{4}} + c_{i_{2}} c_{i_{1}, i_{3}, i_{4}} + c_{i_{3}} c_{i_{1}, i_{2}, i_{4}} + c_{i_{4}} c_{i_{1}, i_{2}, i_{3}},

a_{i_{1}, i_{2}, i_{3}, i_{4}} = c_{i_{1}, i_{2}} c_{i_{3}, i_{4}} + c_{i_{1}, i_{3}} c_{i_{2}, i_{4}} + c_{i_{1}, i_{4}} c_{i_{2}, i_{3}} = + c_{i_{1}} c_{i_{2}, i_{3}, i_{4}} + c_{i_{2}} c_{i_{1}, i_{3}, i_{4}} + c_{i_{3}} c_{i_{1}, i_{2}, i_{4}} + c_{i_{4}} c_{i_{1}, i_{2}, i_{3}},

R^{[n, d]} ∋ M_{d} (X) = σ = 1 \sum d ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})} (X) .

R^{[n, d]} ∋ M_{d} (X) = σ = 1 \sum d ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ ⨂ C_{(k_{r})} (X) .

m_{i} (X) = σ = 1 \sum d ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ \prod c_{i_{k_{r}}} (X) .

m_{i} (X) = σ = 1 \sum d ζ \in {[P_{σ} (1 : d)]} \sum k_{r} \in ζ \prod c_{i_{k_{r}}} (X) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamthm

remarkremark \newsiamthmexampleexample

\headersComputation of higher order cumulantsKrzysztof Domino, Piotr Gawron, Łukasz Pawela

Efficient computation of higher order cumulant tensors††thanks: Submitted to the editors 07.03.2018.

\fundingThe research was partially financed by the National Science Centre, Poland—project number 2014/15/B/ST6/05204.

Krzysztof Domino Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100 Gliwice, Poland () {kdomino, gawron, lpawela}@iitis.pl

Piotr Gawron 22footnotemark: 2

Łukasz Pawela 22footnotemark: 2

keywords:

High order cumulants, non-normally distributed data, numerical algorithms

March, 7, 2018

Abstract

In this paper, we introduce a novel algorithm for calculating arbitrary order cumulants of multidimensional data. Since the $d$ th order cumulant can be presented in the form of an $d$ -dimensional tensor, the algorithm is presented using tensor operations. The algorithm provided in the paper takes advantage of super-symmetry of cumulant and moment tensors. We show that the proposed algorithm considerably reduces the computational complexity and the computational memory requirement of cumulant calculation as compared with existing algorithms. For the sizes of interest, the reduction is of the order of $d!$ compared to the naïve algorithm.

{AMS}

65Y05, 15A69, 65C60

1 Introduction

1.1 Motivation

Cumulants of the order of $d>2$ have recently started to play an important role in the analysis of non-normally distributed multivariate data. Some potential applications of higher-order cumulants include signal filtering problems where the normality assumption is not required (see [25, 32] and references therein). Another application is finding the direction of received signals [45, 41, 10, 33] and signal auto-correlation analysis [37]. Higher-order cumulants are used in hyper-spectral image analysis [26], financial data analysis [2, 29] and neuroimage analysis [9, 5]. Outside the realm of signal analysis, higher order cumulants can be applied to quantum noise investigation purposes [24], as well as to other types of non-normally distributed data, such as weather data [15, 19, 43], various medical data [46], cosmological data [50] or data generated for machine learning purposes [22].

In the examples mentioned above only cumulants of the order of $d\leq 4$ were used due to growing computational complexity and large estimation errors of high order statistics. The computational complexity and the use of computational resources increases considerably with the cumulants’ order by a factor of $n^{d}$ , where $d$ is the order of the cumulant and $n$ is the number of marginal variables.

Despite the foregoing, cumulants of order $d=6$ of multivariate data were successfully used in high-resolution direction-finding methods of multi-source signals (the q-MUSIC algorithm) [12, 11, 13, 34] despite higher variance of the statistic’s estimation. In such an algorithm, the number of signal sources that can be detected is proportional to the cumulant’s order [44]. Cumulants of the order of $d>4$ also play an important role in financial data analyses, as they enable measurement of the risk related to portfolios composed of many assets [48, 38]. This is particularly important during an economic crisis, since higher order cumulants make it possible to sample larger fluctuation of prices [42]. In [48], cumulants of the order of 2–6 of multi-asset portfolios were used as a measure of risk seeking vs. risk aversion. In [38], it was shown that, during an economic crisis, cumulants of the order of $d>4$ are important to analyse variations of assets and prices of portfolios. Further arguments for the utility of cumulants of the order of $d>4$ can be found in [28, 1] and [18] where cumulant tensors of the order of 2–6 were used to analyse financial portfolios during an economic crisis. Finally, let us consider the QCD (Quantum Chromodynamics) phase structure research area. In [23], the authors have evidenced the relevance of cumulants of the order of $5$ and $6$ of net baryon number fluctuations for the analysis of freeze-out and critical conditions in heavy ion collisions. Standard errors of those cumulant estimations were discussed in [36].

In our study, we introduce an efficient method to calculate higher-order cumulants. This method takes advantage of the recursive relation between cumulants and moments as well as their super-symmetric structure. These features enable us to reduce the computational complexity of the naïve algorithm and make the problem tractable. In order to reduce complexity, we use the idea introduced in [49] to decrease the storage and computational requirements by a factor of $O(d!)$ .

This allows us to handle large data sets and overcome a major problem in numerical handling of high order moments and cumulants. Consider that the estimation error of the one-dimensional $d$ th central moment is limited from above by $\sqrt{\frac{M_{2d}}{t}}$ where $M_{2d}$ is the $(2d)$ th central moment and $t$ is number of data samples. This is discussed further in Appendix A. Consequently, the accurate estimation of statistics of the order of $d>4$ requires correspondingly large data sets. In practice, our approach allows us to handle cumulants up to the tenth order.

1.2 Normally and non-normally distributed data

Let us consider the $n$ -dimensional normally distributed random variable $\mathbf{X}\sim\ \mathcal{N}(\mu,\Sigma)$ where $\Sigma$ is a positive-definite covariance matrix and $\mathbf{\mu}$ is a mean value vector. In this case, the characteristic function $\tilde{\phi}(\tau)$ and cumulant generating function $K(\tau)$ [30, 35] are

[TABLE]

It is easy to see that $K(\tau)$ is quadratic in $\tau$ , and therefore its third and higher derivatives with respect to $\tau$ are zero. As we will in the next section, this implies that cumulants of order greater than two are equal to zero.

If data is characterised by a frequency distribution other than the multivariate normal distribution, the characteristic function may be expanded in more terms than quadratic, and cumulants of the order higher than two may have non-zero elements. This is why they are helpful in distinguishing between normally and non-normally distributed data or between data from different non-normal distributions.

1.3 Basic definitions

Let us start with a random process generating discrete $n$ dimensional values. A sequence of $t$ samples of an $n$ dimensional random variable is represented in the form of matrix $\mathbf{X}\in\mathbb{R}^{t\times n}$ such that

[TABLE]

This matrix can be represented as a sequence of vectors of realisations of $n$ marginal variables $X_{i}$

[TABLE]

where

[TABLE]

In order to study moments and cumulants of $\mathbf{X}$ , we need the notion of super-symmetric tensors. Let us first denote the set $\{1,2,\ldots,d\}$ as $1:d$ , a permutation of tuple $\mathbf{i}=(i_{1},\ldots,i_{d})$ as $\pi(\mathbf{i})$ .

Definition 1.1.

Let $\mathcal{A}\in\mathbb{R}^{\overbrace{n\times\cdots\times n}^{d}}$ be a tensor with elements $a_{\mathbf{i}}$ indexed by multi-index $\mathbf{i}=(i_{1},\ldots,i_{d})$ . Tensor $\mathcal{A}$ is super-symmetric iff it is invariant under any permutation $\pi$ of the multi-index, i.e.

[TABLE]

Henceforth we will write $\mathcal{A}\in\mathbb{R}^{[n,d]}$ for super-symmetric tensor $\mathcal{A}$ . A list of all notations used in this paper is provided in Table 1.

Definition 1.2.

Let $\mathbf{X}\in\mathbb{R}^{t\times n}$ be as in Eq. (2). We define the $d$ th moment as tensor $\mathcal{M}_{d}(\mathbf{X})\in\mathbb{R}^{[n,d]}$ . Its elements are indexed by multi-index $\textbf{i}=(i_{1},\ldots,i_{d})$ and equal

[TABLE]

*where $E(X)$ is the expectational value operator and $X_{i_{k}}$ a vector of realisations of the $i_{k}$ th marginal variable. *

Definition 1.3.

Let $\mathbf{X}\in\mathbb{R}^{t\times n}$ be as in Eq. (2). We define centered variable $\tilde{\mathbf{X}}\in\mathbb{R}^{t\times n}$ as

[TABLE]

The first two cumulants respectively correspond to the mean vector and the symmetric covariance matrix of $\mathbf{X}$ . Given the following $K(\tau)$ estimator:

[TABLE]

we first introduce the definition of cumulants of arbitrary order and later explicitly state definitions for cumulants of order one to four [30, 35]

Definition 1.4.

Let $(i_{1},\ldots,i_{d})$ be a multi-index with elements $i_{k}\in 1:n$ , and $K(\tau)$ the cumulant generation function of a given distribution. The $d$ th cumulant element is defined by [30, 35]

[TABLE]

*we drop an imaginary unit in definition for a presentation clarity. *

Definition 1.5.

We define the first cumulant $\mathcal{C}_{1}\in\mathbb{R}^{[n,1]}$ as

[TABLE]

Definition 1.6.

We define the second cumulant $\mathcal{C}_{2}\in\mathbb{R}^{[n,2]}$ as

[TABLE]

Definition 1.7.

We define the third cumulant as a three-mode tensor $\mathcal{C}_{3}\in\mathbb{R}^{[n,3]}$ with elements

[TABLE]

Cumulants of order greater than three can be computed from moments [40, 39], however the relation is complex and requires a special notation which is introduced in Subsection 3.1. To show how complicated the formulas might become we state here the partial formula for the fourth cumulant.

Definition 1.8.

We define the fourth cumulant as a four-mode tensor $\mathcal{C}_{4}\in\mathbb{R}^{[n,4]}$ with elements

[TABLE]

Switching to the centered variable $\tilde{\mathbf{X}}$ , using a fact that $E(\tilde{X}_{i})=0$ , and $c_{\mathbf{i}}(\mathbf{X})=c_{\mathbf{i}}(\tilde{\mathbf{X}})$ for $|\mathbf{i}|\geq 2$ , cumulants of order greater than one are mean shift invariant [39], we can write Eq. (13) in a following manner:

[TABLE]

Remark 1.9.

*Each cumulant tensor $\mathcal{C}_{d}$ as well as each moment tensor $\mathcal{M}_{d}$ is super-symmetric [4]. *

As the formula for a cumulant of an arbitrary order is very complex, our core result is the numerical handling of a cumulant and is discussed in depth in Section 3. To compute cumulant tensor of order $d$ we use central moment tensors of order $2,3,d-2$ and $d$ and take advantage of cumulant and moment tensors super-symmetry. Importantly we do not need to determine $(d-1)$ th moment tensor.

2 Moment tensor calculation

To provide a simpler example, we start with algorithms for calculation of the moment tensor. Next, in Section 3, those algorithms will be utilised to recursively calculate the cumulants.

2.1 Storage of super-symmetric tensors in block

structures

In this section, we are going to follow the idea introduced by Schatz et al. [49] concerning the use of blocks to store symmetric matrices and super-symmetric tensors in an efficient way. To make the demonstration more accessible, we will first focus on the matrix case. Let us suppose we have symmetric matrix $\mathcal{C}_{2}\in\mathbb{R}^{[n,2]}$ . We can store the matrix in blocks and store only upper triangular blocks,

[TABLE]

where NULL represents an empty block, and $\bar{n}=\lceil\frac{n}{b}\rceil$ . Entries below the diagonal do not need to be stored and calculated as they are redundant. Each block $({\mathcal{C}_{2}})_{j_{1},j_{2}}:j_{1}\leq j_{2}\wedge j_{2}<\bar{n}$ is of size $b\times b$ . Blocks $({\mathcal{C}_{2}})_{j_{1},\bar{n}}:j_{1}<\bar{n}$ are of size $b\times b_{l}$ , and block $({\mathcal{C}_{2}})_{\bar{n},\bar{n}}$ is of size $b_{l}\times b_{l}$ , where:

[TABLE]

This representation significantly reduces the overall storage footprint while still providing opportunities to achieve high computational performance.

This representation can easily be extended for purposes of super-symmetric tensors. Let us assume that $\mathcal{C}_{d}\in\mathbb{R}^{[n,d]}$ is a super-symmetric tensor. All data can be stored in blocks $({\mathcal{C}_{d}})_{j_{1},\ldots,j_{d}}\in\mathbb{R}^{b_{j_{1}}\times\cdots\times b_{j_{d}}}$ . If indices $j_{1},\ldots,j_{d}$ are not sorted in an increasing order, such blocks are redundant and consequently replaced by NULL. Similarly to the matrix case we have

[TABLE]

In the subsequent sections we present algorithms for moment and cumulant tensor calculation and storage. For simplicity, we assume that $b|n$ and $\bar{n}=\frac{n}{b}$ . The generalization is straightforward and, at this point, would only obscure the main idea.

Henceforth each block is a hypercube of size $b^{d}$ and there are $\binom{\bar{n}+d-1}{\bar{n}}$ such unique blocks [49]. Such storage scheme, proposed in [49], requires the storage of $b^{d}\binom{\bar{n}+d-1}{\bar{n}}$ elements.

2.2 The algorithm

In this and following sections, we present the moment and cumulant calculation algorithms that use the block structure. To compute the $d$ th moment tensor we use Def. 1.2. Algorithm 1 computes a single block of the tensor, while Algorithm 2 computes the whole tensor in the block structure form.

Based on [49] and the discussion in the previous subsection, we can conclude that reduction of redundant blocks reduces the storage and computational requirements of the $d$ th moment tensor by a factor of $d!$ for $d\ll n$ compared to the naïve algorithm. The detailed analysis of the computational requirements will be presented in Section 5.

2.3 Parallel computation of moment tensor

For large $t$ , it is desirable to speedup the moment tensor calculation further. This can be achieved via a simple parallel scheme. Let us suppose for the sake of simplicity, that we have $p$ processes available, and $p|t$ . Starting with data $\textbf{X}\in\mathbb{R}^{t\times n}$ we can split them into $p$ non overlapping subsets $\textbf{X}_{s}\in\mathbb{R}^{\frac{t}{p}\times n}$ . In the first step, for each subset, we compute in parallel moment tensor $\mathcal{M}_{d}(\mathbf{X}_{s})$ using Algorithm 2. In the second step, we perform the following reduction

[TABLE]

The elements of the tensor under the sum on the RHS are

[TABLE]

The element of the moment tensor of $\mathbf{X}$ is

[TABLE]

These steps are summarised in Algorithm 3.

3 Calculation of cumulant tensors

At this point, we can define our main result, i.e. an algorithm for calculating cumulants of arbitrary order of multi-dimensional data.

3.1 Index partitions and permutations

In this section, we present a recursive formula that can be used to calculate the $d$ th cumulant of $\mathbf{X}$ . We begin with some definitions, mainly concerning combinatorics, before discussing the general formula.

Definition 3.1.

Let $\mathbf{k}=(k_{1},\ldots,k_{d}):k_{i}=i$ , and $\sigma\in 1:d$ . Partition $P_{\sigma}(\mathbf{k})$ of tuple $\mathbf{k}$ is the division of $\mathbf{k}$ into $\sigma$ non-crossing sub-tuples: $P_{\sigma}(\mathbf{k})=(\mathbf{k}_{1},\ldots,\mathbf{k}_{\sigma})$ ,

[TABLE]

In what follows, we will denote the permutations of a tuple of tuples $(\mathbf{i}_{1},\ldots,\mathbf{i}_{\sigma})$ , as $\pi^{\prime}(\mathbf{i}_{1},\ldots,\mathbf{i}_{\sigma})$ .

Definition 3.2.

$[P_{\sigma}(\mathbf{k})]$ * — the representative of the equivalence class of partitions. Let $P_{\sigma}(\mathbf{k})=(\mathbf{k}_{1},\ldots,\mathbf{k}_{\sigma})$ and $P^{\prime}_{\sigma}(\mathbf{k})=(\mathbf{k}^{\prime}_{1},\ldots,\mathbf{k}^{\prime}_{\sigma})$ be partitions of $\mathbf{k}$ . Let us introduce the following equivalence relation:*

[TABLE]

*This relation defines the equivalence class. Henceforth we will take only one representative of each equivalence class and denote it as $[P_{\sigma}(\mathbf{k})]$ . The representative will be such that all $\mathbf{k}_{r}$ are sorted in an increasing order. We will denote a set of all such equivalence classes as $\{[P_{\sigma}(\mathbf{k})]\}$ . *

Remark 3.3.

The number of partitions of set $\mathbf{k}$ of size $d$ into $\sigma$ parts is given by the Stirling Number of the second kind, [27]

[TABLE]

Definition 3.4.

Consider tensors $\mathcal{C}_{d_{1}}\in\mathbb{R}^{[n,d_{1}]}$ , $\mathcal{C}_{d_{2}}\in\mathbb{R}^{[n,d_{2}]}$ indexed by $\mathbf{i}$ and $\mathbf{i^{\prime}}$ respectively. Their outer product $\mathcal{C}_{d_{1}}\otimes\mathcal{C}_{d_{2}}=\mathcal{A}_{d_{1}+d_{2}}\in\mathbb{R}^{\overbrace{n\times\ldots\times n}^{d_{1}+d_{2}}}$ is defined as

[TABLE]

*where $(\mathbf{i},\mathbf{i^{\prime}})$ denotes multi-index $(i_{1},\ldots,i_{d_{1}},i^{\prime}_{1},\ldots,i^{\prime}_{d_{2}})$ . *

As an example consider the outer product of symmetric matrix $\mathcal{C}_{2}$ by itself: $\mathcal{A}_{4}=\mathcal{C}_{2}\otimes\mathcal{C}_{2}$ , that is only partially symmetric, $a_{i_{1},i_{2},i_{3},i_{4}}=c_{i_{2},i_{1},i_{3},i_{4}}=c_{i_{1},i_{2}i_{4},i_{3}}=c_{i_{3},i_{4},i_{1},i_{2}}$ , but in general $c_{i_{1},i_{2},i_{3},i_{4}}\neq c_{i_{1},i_{3},i_{2},i_{4}}\neq c_{i_{1},i_{4},i_{3},i_{2}}$ . To obtain a super-symmetric outcome of the outer product of super-symmetric tensors, we need to apply the following symmetrisation procedure.

Definition 3.5.

The sum of outer products of super-symmetric tensors. Let $\mathcal{A}_{d}\in\mathbb{R}^{[n,d]}$ be a tensor indexed by $\mathbf{i}=(i_{1},\ldots,i_{d})$ . Let $\mathbf{k}_{r}$ be a sub-tuple of its modes according to Def. 3.2, and let $\mathbf{i}_{\mathbf{k}_{r}}=\left(i_{(\mathbf{k}_{r})_{1}},\ldots,i_{(\mathbf{k}_{r})_{|\mathbf{k}_{r}|}}\right)$ . For the given $\sigma$ , we define the sum of outer products of $\mathcal{C}_{d_{r}}\in\mathbb{R}^{[n,d_{r}]}$ where $r\in 1:\sigma$ and $\sum_{r=1}^{\sigma}d_{r}=d$ , using the elementwise notation, as

[TABLE]

We will use the following abbreviation using tensor notation

[TABLE]

Consider $\mathcal{A}_{d}$ as in Eq. (26) where $\mathcal{C}_{d_{r}}\in\mathbb{R}^{[n,d_{r}]}$ are super-symmetric and $\mathcal{C}_{d_{r}}=\mathcal{C}_{d_{r^{\prime}}}$ iff $d_{r}=d_{r^{\prime}}$ and $\mathbf{i}$ is a multi-index of $\mathcal{A}_{d}$ . The sum over all representatives of equivalence classes $\{[P_{\sigma}(1:d)]\}$ fully symmetrises the outer product, and therefore $\mathcal{A}_{d}$ is super-symmetric. In other words, due to the super-symmetry, any permutation of multi–index $\mathbf{i}$ of $\mathcal{A}_{d}$ that leads only to a permutation of indices inside some $\mathcal{C}_{d_{r}}$ refers to the same value of $\mathcal{A}_{d}$ . Any permutation of $\mathbf{i}$ that leads only to the switch between $\mathcal{C}_{d_{r}}$ and $\mathcal{C}_{d_{r^{\prime}}}$ inside an outer product in Eq. (26) also refers to the same value of $\mathcal{A}_{d}$ . Any other permutation of $\mathbf{i}$ that cannot be represented as above switches between equivalence classes as well, and so it switches between elements of sum Eq. (26) and refers to the same value of $\mathcal{A}_{d}$ .

Example 3.6.

Consider $\mathcal{C}_{1}\in\mathbb{R}^{[n,1]},\mathcal{C}_{2}\in\mathbb{R}^{[n,2]},\mathcal{C}_{3}\in\mathbb{R}^{[n,3]}$ , and $\mathcal{A}_{4}\in\mathbb{R}^{[n,4]}$ such that

[TABLE]

then

[TABLE]

*such $\mathcal{A}_{4}$ is super-symmetric, since there is no permutation of $(i_{1},i_{2},i_{3},i_{4})$ that changes its elements, i.e. $a_{i_{1},i_{2},i_{3},i_{4}}=a_{i_{2},i_{1},i_{3},i_{4}}=a_{i_{3},i_{2},i_{1},i_{4}}=a_{i_{3},i_{4},i_{1},i_{2}}=\ldots$ . *

3.2 Cumulant calculation formula

The following recursive relation can be used to relate moments and cumulants of $\mathbf{X}$ :

[TABLE]

This can be written in an elementwise manner as in [4]

[TABLE]

For the sake of completeness, we present an alternative proof of Eq. (30) in Appendix B.

In order to compute $\mathcal{C}_{d}(\mathbf{X})$ , let us consider the case where $\sigma=1$ separately. By definition, $[P_{\sigma=1}(1:d)]=(1,\ldots,d)$ , so:

[TABLE]

The $d$ th cumulant tensor can be calculated given the $d$ th moment tensor and cumulant tensors of the order of $r\in 1:(d-1)$

[TABLE]

To simplify Eq. (32), let us observe that cumulants of the order of two or higher for a non-centered variable and a centered variable are equal. The first order cumulant for a centered variable is zero. Hereafter, we introduce partitions into sub-tuples of size larger than one.

Definition 3.7.

Let $\mathbf{k}=(1,\ldots,d)$ , and $\sigma\in 1:d$ . The at least two element partition $P_{\sigma}^{(2)}(\mathbf{k})$ of tuple $\mathbf{k}$ is the division of $\mathbf{k}$ into $\sigma$ sub-tuples: $P_{\sigma}^{(2)}(\mathbf{k})=(\mathbf{k}_{1},\ldots,\mathbf{k}_{\sigma})$ , such that

[TABLE]

The definition of the representative of equivalence class $[P^{(2)}_{\sigma}(\mathbf{k})]$ and the set of such representatives $\{[P^{(2)}_{\sigma}(\mathbf{k})]\}$ are analogous to Def. 3.2. Consequently, we can derive the final formula

[TABLE]

Let us determine the $\sigma_{\max}$ limit. If $d$ is even, it can be divided into at most $\sigma_{\max}=\frac{d}{2}$ parts of size two; if $d$ is odd, it can be divided into at most $\sigma_{\max}=\frac{d-1}{2}$ parts: $\frac{d-1}{2}-1$ parts of size two and one part of size three. Hence we can conclude that $\sigma_{\max}=\lfloor\frac{d}{2}\rfloor$ .

As a simple example, consider the cumulants of the order of three and four. Since $\forall_{\sigma}\{[P^{(2)}_{\sigma}(1:3)]\}=\emptyset\ \wedge\ \{[P^{(2)}_{\sigma}(1:2)]\}=\emptyset$ , then $\mathcal{C}_{2}(\mathbf{X})=\mathcal{M}_{2}(\mathbf{\tilde{X}})$ and $\mathcal{C}_{3}(\mathbf{X})=\mathcal{M}_{3}(\mathbf{\tilde{X}})$ , i.e. the second cumulant matrix and the third cumulant tensor are simply the second and the third central moments. Formulas for cumulant tensors of the order greater than three are more complicated. For example, consider the $4$ th cumulant tensor

[TABLE]

Using the elementwise notation, where $\mathbf{i}=(i_{1},i_{2},i_{3},i_{4})$ , we have

[TABLE]

3.3 Algorithms to compute cumulant tensors

Let us suppose that $(\mathcal{B}_{\mathbf{i}})_{\mathbf{j}}$ is the $\mathbf{i}$ th element of the $\mathbf{j}$ th block of the super–symmetric tensor of the order of $|\mathbf{i}|=|\mathbf{j}|=d$ . Similarly, $(\mathcal{C}_{\mathbf{i}_{\mathbf{k}}})_{\mathbf{j}_{\mathbf{k}}}$ is the $\mathbf{i}_{\mathbf{k}}$ th element of the $\mathbf{j}_{\mathbf{k}}$ th block of the $|\mathbf{i}_{\mathbf{k}}|=|\mathbf{j}_{\mathbf{k}}|$ th cumulant tensor according to Def. 3.5—we skip now $r$ in $\mathbf{k}_{r}$ for brevity. With reference to Def. 3.5 and Def. 3.2, $\mathbf{k}$ is always sorted and from the properties of the block structure $\mathbf{j}$ is also sorted, hence $\mathbf{j}_{\mathbf{k}}$ is sorted as well. To determine $\{[P^{(2)}_{\sigma}(1:d)]\}$ we use modified Knuth’s algorithm $7.2.1.5H$ [31]. Now we have all the components to introduce Algorithm 4 which computes a super-symmetric sum of outer products of lower order cumulants. Algorithm 4 computes the inner sum of Eq. (34) and takes advantages of the super-symmetry of tensors by using the block structure.

Finally, Algorithm 5 computes the $d$ th cumulant tensor. It uses Eq. (34) to calculate the cumulants and importantly takes advantage of the super-symmetry of tensors, because it refers to Algorithm 2 (moment tensor calculation) and Algorithm 4 that both use the block structure.

4 Implementation

All algorithms presented in this paper are implemented in the Julia programming language [8, 7]. Julia is a high level language in which multi-dimensional tables are first class types [6]. For purposes of the algorithms, two modules were created. In the first one, SymmetricTensors.jl, [21] the block structure of super-symmetric tensors was implemented. In the second module, Cumulants.jl, [20] we used the block structure to compute and store moment and cumulant tensors. The implementation of cumulants calculation uses multiprocessing primitives built into the Julia programming language: remote references and remote calls. A remote reference is an object that allows any process to reference an object stored in a specific process. A remote call allows a process to request a function call on certain arguments on another process.

5 Performance analysis

This section is dedicated to the performance analysis of the core elements of our algorithms. These are Eq. (6) which calculates the moment tensor and Eq. (34) which calculates the cumulant tensor. First, we discuss theoretical analysis and then focus on the performance of our implementation. In the final subsection, we show how our implementation compares to the current state of the art.

5.1 Theoretical analysis

We start by discussing the performance of the moment tensor. With reference to Section 2, let us recall that storage of the moment tensor requires storage of the $b^{d}\binom{\bar{n}+d-1}{\bar{n}}$ floating-point numbers. We can approximate $b^{d}\binom{\bar{n}+d-1}{\bar{n}}\approx\frac{n^{d}}{d!}$ for $d\ll n$ [49]. Since we usually calculate cumulants of the order of $\leq 10$ and we deal with high dimensional data, we need approximately $\frac{1}{d!}$ of the computer storage space, compared with the naïve storage scheme.

As for the cumulants, one should primarily note that the number of elements of the inner sum in Eq. (34) in the second line equals the number of set partitions of $\mathbf{k}=(1,\ldots,d)$ into exactly $\sigma$ parts, such that no part is of size one, and can be represented as:

[TABLE]

We call it a modification of the Stirling number of the second kind $S(d,\sigma)$ , and compute it as follows

[TABLE]

where we count the number of ways to divide $d$ elements into subsets of length $d_{1}\geq 2,\ldots,d_{r}\geq 2,\ldots,d_{\sigma}\geq 2$ such that $\sum_{r=1}^{\sigma}d_{r}=d$ , and so $d_{\sigma}=d-\sum_{r=1}^{\sigma-1}d_{k}$ . Factor $\sigma!$ in the denominator counts the number of subset permutations. Some examples of $S^{\prime}(d,\sigma)$ are $S^{\prime}(4,2)=3$ , $S^{\prime}(5,2)=10$ , $S^{\prime}(6,2)=25$ and $S^{\prime}(6,3)=15$ .

The following sum

[TABLE]

is the number of all partitions of a set of size $d$ into subsets, such that there is no subset of size one, and therefore

[TABLE]

Here $B(d)$ is a Bell number [14], the number of all partitions of a set of size $d$ including subsets of size one and $B(d)-F(d)$ is the number of partitions of a set of size $d$ into subsets such that at least one subset is of size one. Relation Eq. (40) is derived from the fact that there is a bijective relation between partitions of $d$ element set into subsets such that at least one subset is of size one and partitions of $d+1$ element set into subsets such that there is no subset of size one.

To compute each element of the inner sum in Eq. (34), we need $\sigma-1$ multiplications, and consequently to compute each element of the outer sum in Eq. (34), we need $(\sigma-1)S^{\prime}(d,\sigma)$ multiplications. Finally, the number of multiplications required to compute the second term of the RHS of Eq. (34) is

[TABLE]

for each tensor element. Let us note that $N(4)=3$ , $N(5)=10$ , $N(6)=55$ . The plot of $N(d)$ and the upper bound $U(d)$ are shown in Fig. 1. From Fig. 1 the proposed upper bound produces a very good approximation of the number of multiplications.

From Eq. (6), Eq. (34) and Eq. (41) we can conclude that to compute the $d$ th cumulant’s element we need $(d-1)t$ multiplications for the central moment and $N(d)$ multiplications for the sums in Eq. (34). However, in order to calculate the cumulant in an accurate manner, we need large data sets, i.e. for $d=4$ we use $t\sim 10^{5}$ and for $d>4$ the data size must be even larger. Bearing in mind that computation of cumulants of the order of $d>10$ is inapplicable, the foregoing gives $(d-1)t\gg N(d)$ . Henceforth dominant computational power is required to calculate the moment tensor, so there appears the need for approximately $(d-1)t$ multiplications to compute each $d$ th cumulant’s element. To compute the whole $d$ th cumulant tensor we need approximately $\frac{(d-1)tn^{d}}{d!}$ multiplications, while the factor $d!$ is a result of taking advantage of super-symmetry. The added cost due to blocking is negligible, see [49].

It is now possible to compare the complexity of our algorithm with that of the naïve algorithm for chosen cumulants. For the $4$ th cumulant, the naïve algorithm would use Eq. (14) directly and would not take advantage of the super-symmetry of tensors. Therefore, it requires $9t$ multiplications to compute a single cumulant tensor element and $9tn^{4}$ multiplications to compute the whole cumulant tensor. Our algorithm, in this case, decreases the computational complexity by the factor of $3\cdot 4!=72$ .

Analogically, the naïve formula for the $5$ th cumulant

[TABLE]

requires approximately $34tn^{5}$ multiplications to compute the whole cumulant tensor. Our algorithm, in this case, decreases the computational complexity by the factor of $\frac{34}{4}5!=900$ . For higher $d$ , the difference is even greater due to the $d!$ factor caused by the application of the block structure and the fact that the number of terms in naïve formulas grows with $d$ much faster than $F(d)$ from Eq. (40).

5.2 Implementation performance

In this section, we analyse the performance analysis of our implementation. All computations were performed in the Prometheus computing cluster. This cluster provides shared user access with multiple user tasks running on each node. Each node is an HP XL730f Gen9 computing system with dual Intel Xeon E5-2680v3 processors providing 12 physical cores and 24 computing cores with hyper-threading. The node has 128 GB of memory.

5.2.1 The optimal size of blocks

The number of coefficients required to store a super-symmetric tensor of order $d$ and $n$ dimensions is equal to $\binom{d+n-1}{n}$ . The storage of tensor disregarding the super-symmetry requires $n^{d}$ coefficients. The block structure introduced in [49] uses more than minimal amount of memory but allows for easier further processing of super-symmetric tensors.

If we store the super-symmetric tensor in the block structure, the block size parameter $b$ appears. In our implementation in order to store a super-symmetric tensor in the block structure we need, assuming $n|b$ , an array of $(\frac{n}{b})^{d}$ pointers to blocks and an array of the same size of flags that contain the information if a pointer points to a valid block. Recall that diagonal blocks contain redundant information. Therefore on the one hand, the smaller the value of $b$ , the less redundant elements on diagonals of the block structure. On the other hand, the larger the value of $b$ , the smaller the number of blocks, the smaller the blocks’ operation overhead, and the fewer the number of pointers pointing to empty blocks. For detailed discussion of memory usage see [49]. The analysis of the influence of the parameter $b$ on the computational time of cumulants for some parameters are presented in Fig. 2. We obtain the shortest computation time for $b=2$ in almost all test cases, and this value will be set as default and used in all efficiency tests. Note that for $b=1$ we loose all the memory savings.

5.2.2 Comparison with naïve algorithms

The computational speedup of cumulant calculation for the illustrative data is presented in Fig. 3. The computational speedup is even higher than the theoretical value of $72$ , which is probably due to large operational memory requirements and some computational overhead while splitting data into terms of Eq. (36) used by the naïve approach.

As for the moment calculation, let us recall from Section 2.1 that we expect a speedup on the level of $d!$ . As can be shown in Fig 4, this is the case for a high number of marginal variables $n$ , as we approach speedup equal to $24$ for the fourth moment. This is a case, since there is some redundancy in computation of diagonal blocks which decreases as $n$ rises, given $b$ .

5.2.3 Multiprocessing performance

In this section we analyse the multiprocessing performance of moment tensor calculations, since according to Subsection 5.1, the moment tensor calculation takes the majority of cumulants calculation time. Fig. 5 shows the speedup of multiprocess moment calculation compared to single process calculation. As can be shown in the figure, at first, we obtain linear scaling of the speedup with the number of processes. Next, we reach the saturation point. This is expected, as there are some parts of this calculation that cannot be done in parallel. Adding more processes leads to a drop in the speedup. This is due to the fact that adding more processes results in more overall overhead, yet we do not benefit from splitting the data further.

5.3 Comparison with the state of the art

The state of the art in terms of the cumulant calculation simplification is referred to as umbral calculus [47], which is a formal system consisting of certain operations on objects called umbrae, mimicking addition and multiplication of independent real-valued random variables. Using umbrae notation one can determine symbolic formulas to calculate elements of cumulant tensors. See [17] where cumulants, also called $k$ -statistics, were derived using purely combinatorial operations. However, symbolic computations are less universal and sometimes problematic, while translating them into algorithms and code is not entirely straightforward.

We present a more general approach by implementing an algorithm that takes multivariate data in the form of a matrix and computes its cumulant tensors. The current state of the art is a package written in the R programming language [16]. This algorithm uses the recursion relation to compute the $d$ th cumulant from moments of the order of $1,\ldots,d$ [40, 39, 3], see Eq. (43).

[TABLE]

where $|\zeta|$ is the number of parts in the given partition $\zeta$ . The algorithm computes each element of cumulant tensors, without taking advantage of their super-symmetry. For comparison, our formula, i.e. Eq. (34) is simpler, as it lacks factor $\left(|\zeta|-1\right)!(-1)^{|\zeta|-1}$ and the inner sum has less elements, since we have introduced $P^{(2)}_{\sigma}$ instead of $P_{\sigma}$ . Further application of Eq. (34) enables us to compute the $d$ th cumulant tensor without determining the $(d-1)$ th moment tensor. This fact can be advantageous in high-resolution direction-finding methods for multi-source signals (the q-MUSIC algorithm) [12] where one needs a cumulant of the order of $6$ but not a cumulant and a moment of the order of $5$ . Furthermore, the major benefit of our algorithm is the utilisation of the super-symmetry of cumulant tensors. By introducing blocks, the computational complexity can be reduced by a factor of $d!$ in the same manner as the storage requirement is reduced.

In [16] two algorithms were implemented: one—four_cumulants_direct—that uses a direct formula for cumulants of orders $1$ — $4$ which we call the specialized algorithm, and other one—cumulants_upto_p—that can compute cumulants of arbitrary order using Eq. (43), which we call the general algorithm. Both of these algorithms were implemented in the R programming language. The specialized algorithm outperforms the general one in terms of speed. For comparison’s sake, we re-implemented the general algorithm from [16] in Julia maintaining high similarity between both implementations.

To perform the efficiency comparison, we compare the computational time of our algorithm with the aforementioned algorithms. The obtained results are summarised in Fig. 6 which contains:

•

The comparison of our algorithm and the general algorithm [16] re-implemented in Julia—Fig. 6(a).

•

The comparison between our algorithm and the general algorithm implemented in [16]—Fig. 6(b). Our algorithm is faster by two orders of magnitude owing to the fact that there exists $d!$ acceleration factor. It results from the utilisation of super-symmetry through application the block structure. It turns out that our implementation achieves in practice even higher acceleration.

•

The comparison of our algorithm vs. the specialised algorithm implemented in [16]—Fig. 6(c).

6 Conclusions

This paper provides a discussion on both the method and the algorithm for calculation of arbitrary order moment and cumulant tensors given multidimensional data. To this end, we introduce the recurrence relation between the $d$ th cumulant tensor and the $d$ th central moment tensor as well as cumulant tensors of the order of $2,\ldots,d-2$ . For purposes of efficient computation and storage of super-symmetric tensors, we use blocks to store and calculate only the pyramidal part of cumulant and moment tensors. Our algorithm is significantly faster than the existing algorithms. The theoretical speedup is given by the factor of $d!$ , which makes the algorithm applicable in the analysis of large data sets. Another important aspect is that large data sets are required to approximate accurately high order statistics on account of their large approximation error. If the estimation error challenge is successfully tackled, high order multidimensional statistics such as high order moments or cumulants will be an important tool to analyse non-normally distributed data, where the mean vector and the covariance matrix contain little information about the data. There are many applications of such statistics, particularly involving signal analysis, financial data analysis, hyper-spectral data analysis or particle physics.

Appendix A The estimation error of high order statistics

Let $M_{d}$ be an estimator of the $d$ th moment of one-dimensional centered random variable $V$ , and let us have available $t$ realisations of $V$ . As we consider large $t$ , the bias of such an estimator can be neglected as being much smaller than a standard error. Hence, we can use the following estimator

[TABLE]

where we just sum $t$ independent random variables $V_{l}$ raised to the power of $d$ . The variance of $M_{d}$ can be represented as:

[TABLE]

Since $V_{1},\ldots,V_{t}$ are independent and equal in distribution to $V$ ,

[TABLE]

hence

[TABLE]

and obviously this limit is relevant if $M_{2d}$ exists. In the multivariate case $V_{1},\ldots,V_{t}$ are only independent in groups. The number of groups can be estimated using the number of marginal variables $n$ , but still $n\ll t$ . Consequently, a similar limitation can be expected, but replacing $M_{2d}$ with a product of moments of lower orders.

Appendix B The recurrence formula for cumulant calculations

We recall the cumulant generating function

[TABLE]

which is related to the moment generation (characteristic) function $\tilde{\phi}(\tau)$ , $K(\tau)=\log(\tilde{\phi}(\tau)).$ For simplicity, we use the following notation: $\partial_{i}=\frac{\partial}{\partial\tau_{i}}$ , $\partial_{\mathbf{i}}=\partial_{i_{1},\ldots,i_{d}}=\frac{\partial^{d}}{\partial\tau_{i_{1}},\ldots,\partial\tau_{i_{d}}}$ , and drop $\mathbf{X}$ in notation $c(\mathbf{X})\rightarrow c$ . The elements of the moment and cumulant tensor at multi-index $\mathbf{i}$ are

[TABLE]

We have the following theorem

Proposition B.1.

For each $\mathbf{i}$ the following holds:

[TABLE]

Proof B.2.

For $|\mathbf{i}|=1$ the results follow from direct inspection. Next, for $|\mathbf{i}|=2$ we get:

[TABLE]

Now assume that Eq. (50) holds for $|\mathbf{i}|=d$ . Differentiating its LHS, we have

[TABLE]

further using Eq. (50) we obtain $\partial_{i_{d+1}}\tilde{\phi}(\tau)=\tilde{\phi}(\tau)c_{i_{d+1}}(\tau)$ , therefore

[TABLE]

where $\mathbf{i}^{\prime}=(\mathbf{i},i_{d+1})$ . After differentiating Eq. (49), we have

[TABLE]

and analogously

[TABLE]

Differentiating the RHS of Eq. (50),

[TABLE]

comparing Eq. (56) with Eq. (53), we have

[TABLE]

Finally, we obtain

[TABLE]

If we observe that $\tilde{\phi}(\tau)\big{|}_{\tau=0}=1$ and $m_{\mathbf{i}}=\partial_{\mathbf{i}}\tilde{\phi}(\tau)\big{|}_{\tau=0}$ and $c_{\mathbf{i}}(\tau)\big{|}_{\tau=0}=c_{\mathbf{i}}$ , then Eq. (50) at $\tau=0$ will give Eq. (30).

Acknowledgements

The authors would like to thank Adam Glos for revising the manuscript and Zbigniew Puchała for the discussion about error estimation and set partitions. This research was supported in part by PL-Grid Infrastructure

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. S. Amin and H. M. Kat , Hedge Fund Performance 1990-2000: Do the “Money Machines" Really Add Value? , Journal of financial and quantitative analysis, 38 (2003), pp. 251–274.
2[2] J. C. Arismendi and H. Kimura , Monte Carlo Approximate Tensor Moment Simulations , Available at SSRN 2491639, (2014).
3[3] N. Balakrishnan, N. L. Johnson, and S. Kotz , A note on relationships between moments, central moments and cumulants from multivariate distributions , Statistics & probability letters, 39 (1998), pp. 49–54.
4[4] O. E. Barndorff-Nielsen and D. R. Cox , Asymptotic techniques for use in statistics , Chapman & Hall, 1989.
5[5] H. Becker, L. Albera, P. Comon, M. Haardt, G. Birot, F. Wendling, M. Gavaret, C.-G. Bénar, and I. Merlet , EEG extended source localization: tensor-based vs. conventional methods , Neuro Image, 96 (2014), pp. 143–157.
6[6] J. Bezanson, J. Chen, S. Karpinski, V. Shah, and A. Edelman , Array operators using multiple dispatch: A design methodology for array implementations in dynamic languages , in Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ACM, 2014, p. 56.
7[7] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah , Julia: A fresh approach to numerical computing , SIAM Review, 59 (2017), pp. 65–98.
8[8] J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman , Julia: A fast dynamic language for technical computing , ar Xiv:1209.5145, (2012).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Efficient computation of higher order cumulant tensors††thanks: Submitted to the editors 07.03.2018.

keywords:

Abstract

1 Introduction

1.1 Motivation

1.2 Normally and non-normally distributed data

1.3 Basic definitions

Definition 1.1**.**

Definition 1.2**.**

Definition 1.3**.**

Definition 1.4**.**

Definition 1.5**.**

Definition 1.6**.**

Definition 1.7**.**

Definition 1.8**.**

Remark 1.9**.**

2 Moment tensor calculation

2.1 Storage of super-symmetric tensors in block

2.2 The algorithm

2.3 Parallel computation of moment tensor

3 Calculation of cumulant tensors

3.1 Index partitions and permutations

Definition 3.1**.**

Definition 3.2**.**

Remark 3.3**.**

Definition 3.4**.**

Definition 3.5**.**

Example 3.6**.**

3.2 Cumulant calculation formula

Definition 3.7**.**

3.3 Algorithms to compute cumulant tensors

4 Implementation

5 Performance analysis

5.1 Theoretical analysis

5.2 Implementation performance

5.2.1 The optimal size of blocks

5.2.2 Comparison with naïve algorithms

5.2.3 Multiprocessing performance

5.3 Comparison with the state of the art

6 Conclusions

Appendix A The estimation error of high order statistics

Appendix B The recurrence formula for cumulant calculations

Proposition B.1**.**

Proof B.2**.**

Acknowledgements

Definition 1.1.

Definition 1.2.

Definition 1.3.

Definition 1.4.

Definition 1.5.

Definition 1.6.

Definition 1.7.

Definition 1.8.

Remark 1.9.

Definition 3.1.

Definition 3.2.

Remark 3.3.

Definition 3.4.

Definition 3.5.

Example 3.6.

Definition 3.7.

Proposition B.1.

Proof B.2.