On the Bhattacharya-Mesner rank of third order hypermatrices

Edinah K. Gnang; Yuval Filmus

arXiv:1706.06090·math.CO·May 21, 2019

On the Bhattacharya-Mesner rank of third order hypermatrices

Edinah K. Gnang, Yuval Filmus

PDF

TL;DR

This paper introduces the Bhattacharya-Mesner rank for third order hypermatrices, providing bounds, conditions for inverses, and extending classical linear algebra theorems to hypermatrices.

Contribution

It defines a new hypermatrix rank, establishes bounds, and generalizes key linear algebra concepts like invertibility and the rank-nullity theorem.

Findings

01

Defined Bhattacharya-Mesner rank for hypermatrices

02

Derived bounds for tensor rank using this new rank

03

Extended matrix inverse and rank-nullity theorem to hypermatrices

Abstract

We introduce the Bhattacharya-Mesner rank of third order hypermatrices as a relaxation to the tensor rank and devise from it some bounds for the tensor rank. We use the Bhattacharya-Mesner rank to extend to third order hypermatrices the connection relating the rank to a notion of linear dependence. We also derive explicit necessary and sufficient conditions for the existence of third order hypermatrix inverse pair. Finally we use inverse pair to extend to third order hypermatrices the formulation and proof of the matrix rank-nullity theorem.

Equations380

{0, \dots, n_{0} - 1} \times {0, \dots, n_{1} - 1} \times \dots \times {0, \dots, n_{m - 1} - 1} .

{0, \dots, n_{0} - 1} \times {0, \dots, n_{1} - 1} \times \dots \times {0, \dots, n_{m - 1} - 1} .

\left\{\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\,:\,\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}\right\}.

\left\{\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\,:\,\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}\right\}.

\min_{\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}}\left\|\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\right\|_{\ell_{0}}

\min_{\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}}\left\|\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\right\|_{\ell_{0}}

\min\left(m,n\right)-\min_{\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}}\left\|\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\right\|_{\ell_{0}}

\min\left(m,n\right)-\min_{\begin{array}[]{c}\mathbf{A}\in\text{GL}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{GL}_{n}\left(\mathbb{K}\right)\end{array}}\left\|\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\right\|_{\ell_{0}}

\left\{\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\,:\,\begin{array}[]{c}\mathbf{A}\in\text{U}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{U}_{n}\left(\mathbb{K}\right)\end{array}\right\}.

\left\{\mathbf{A}\cdot\mathbf{M}\cdot\mathbf{B}\,:\,\begin{array}[]{c}\mathbf{A}\in\text{U}_{m}\left(\mathbb{K}\right)\\ \mathbf{B}\in\text{U}_{n}\left(\mathbb{K}\right)\end{array}\right\}.

{A \cdot T \cdot B : A, B \in GL_{n} (K) and A \cdot B = I_{n}} .

{A \cdot T \cdot B : A, B \in GL_{n} (K) and A \cdot B = I_{n}} .

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

\mbox{Prod}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)}\right)\left[i_{0},i_{1}\right]=\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j}\right]\mathbf{A}^{(1)}\left[{\color[rgb]{1,0,0}j},i_{1}\right].

\mbox{Prod}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)}\right)\left[i_{0},i_{1}\right]=\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j}\right]\mathbf{A}^{(1)}\left[{\color[rgb]{1,0,0}j},i_{1}\right].

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mbox{Prod}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)},\mathbf{A}^{(2)}\right)\left[i_{0},i_{1},i_{2}\right]=\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j},i_{2}\right]\mathbf{A}^{(1)}\left[i_{0},i_{1},{\color[rgb]{1,0,0}j}\right]\mathbf{A}^{(2)}\left[{\color[rgb]{1,0,0}j},i_{1},i_{2}\right].

\mbox{Prod}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)},\mathbf{A}^{(2)}\right)\left[i_{0},i_{1},i_{2}\right]=\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j},i_{2}\right]\mathbf{A}^{(1)}\left[i_{0},i_{1},{\color[rgb]{1,0,0}j}\right]\mathbf{A}^{(2)}\left[{\color[rgb]{1,0,0}j},i_{1},i_{2}\right].

\left\{\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}\left[0,{\color[rgb]{1,0,0}j},i_{2}\right]\mathbf{x}\left[0,0,{\color[rgb]{1,0,0}j}\right]\mathbf{B}\left[{\color[rgb]{1,0,0}j},0,i_{2}\right]\,=\,\mathbf{c}\left[0,0,i_{2}\right]\right\}_{0\leq i_{2}<n_{2}}.

\left\{\sum_{0\leq{\color[rgb]{1,0,0}j}<\ell}\mathbf{A}\left[0,{\color[rgb]{1,0,0}j},i_{2}\right]\mathbf{x}\left[0,0,{\color[rgb]{1,0,0}j}\right]\mathbf{B}\left[{\color[rgb]{1,0,0}j},0,i_{2}\right]\,=\,\mathbf{c}\left[0,0,i_{2}\right]\right\}_{0\leq i_{2}<n_{2}}.

\mbox P r o d (A, x, B) = c .

\mbox P r o d (A, x, B) = c .

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mbox P r o d_{B} (A^{(0)}, A^{(1)}, A^{(2)}) [i_{0}, i_{1}, i_{2}] =

\mbox P r o d_{B} (A^{(0)}, A^{(1)}, A^{(2)}) [i_{0}, i_{1}, i_{2}] =

\sum_{0\leq{\color[rgb]{1,0,0}j_{0}},{\color[rgb]{1,0,0}j_{1}},{\color[rgb]{1,0,0}j_{2}}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j_{0}},i_{2}\right]\mathbf{A}^{(1)}\left[i_{0},i_{1},{\color[rgb]{1,0,0}j_{1}}\right]\mathbf{A}^{(2)}\left[{\color[rgb]{1,0,0}j_{2}},i_{1},i_{2}\right]\mathbf{B}\left[{\color[rgb]{1,0,0}j_{0}},{\color[rgb]{1,0,0}j_{1}},{\color[rgb]{1,0,0}j_{2}}\right].

\sum_{0\leq{\color[rgb]{1,0,0}j_{0}},{\color[rgb]{1,0,0}j_{1}},{\color[rgb]{1,0,0}j_{2}}<\ell}\mathbf{A}^{(0)}\left[i_{0},{\color[rgb]{1,0,0}j_{0}},i_{2}\right]\mathbf{A}^{(1)}\left[i_{0},i_{1},{\color[rgb]{1,0,0}j_{1}}\right]\mathbf{A}^{(2)}\left[{\color[rgb]{1,0,0}j_{2}},i_{1},i_{2}\right]\mathbf{B}\left[{\color[rgb]{1,0,0}j_{0}},{\color[rgb]{1,0,0}j_{1}},{\color[rgb]{1,0,0}j_{2}}\right].

\boldsymbol{\Delta}\left[i_{0},i_{1},i_{2}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq i_{0}=i_{1}=i_{2}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

\boldsymbol{\Delta}\left[i_{0},i_{1},i_{2}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq i_{0}=i_{1}=i_{2}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

A^{⊤} [i_{0}, i_{1}, i_{2}] = A [i_{2}, i_{0}, i_{1}] .

A^{⊤} [i_{0}, i_{1}, i_{2}] = A [i_{2}, i_{0}, i_{1}] .

A^{⊤^{2}} := (A^{⊤})^{⊤}, A^{⊤^{3}} := (A^{⊤^{2}})^{⊤} = A .

A^{⊤^{2}} := (A^{⊤})^{⊤}, A^{⊤^{3}} := (A^{⊤^{2}})^{⊤} = A .

A^{⊤^{i}} = A^{⊤^{j}} \mbox i f i \equiv j mod 3.

A^{⊤^{i}} = A^{⊤^{j}} \mbox i f i \equiv j mod 3.

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mathbf{A}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{A}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{A}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mbox P r o d (A^{(0)} [:, t, :], A^{(1)} [:, :, t], A^{(2)} [t, :, :]) .

\mbox P r o d (A^{(0)} [:, t, :], A^{(1)} [:, :, t], A^{(2)} [t, :, :]) .

\boldsymbol{\Delta}^{(t)}\left[i_{0},i_{1},i_{2}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq t=i_{0}=i_{1}=i_{2}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

\boldsymbol{\Delta}^{(t)}\left[i_{0},i_{1},i_{2}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq t=i_{0}=i_{1}=i_{2}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

\mathbf{X}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{X}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

\mathbf{X}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{X}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

A = 0 \leq t < r \sum \mbox P r o d_{Δ^{(t)}} (X^{(0)}, X^{(1)}),

A = 0 \leq t < r \sum \mbox P r o d_{Δ^{(t)}} (X^{(0)}, X^{(1)}),

\mathbf{Y}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{Y}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

\mathbf{Y}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{Y}^{(1)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}},

A \neq = 0 \leq t < r - 1 \sum \mbox P r o d_{Δ^{(t)}} (Y^{(1)}, Y^{(2)}),

A \neq = 0 \leq t < r - 1 \sum \mbox P r o d_{Δ^{(t)}} (Y^{(1)}, Y^{(2)}),

\boldsymbol{\Delta}^{(t)}\left[i_{0},i_{1}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq t=i_{0}=i_{1}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

\boldsymbol{\Delta}^{(t)}\left[i_{0},i_{1}\right]=\begin{cases}\begin{array}[]{cc}1&\mbox{ if }\>0\leq t=i_{0}=i_{1}<n\\ 0&\mbox{otherwise}\end{array}\end{cases}.

\mathbf{X}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{X}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{X}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mathbf{X}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{X}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{X}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

A = 0 \leq t < r \sum \mbox P r o d_{Δ^{(t)}} (X^{(0)}, X^{(1)}, X^{(2)}),

A = 0 \leq t < r \sum \mbox P r o d_{Δ^{(t)}} (X^{(0)}, X^{(1)}, X^{(2)}),

\mathbf{Y}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{Y}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{Y}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

\mathbf{Y}^{(0)}\in\mathbb{K}^{n_{0}\times{\color[rgb]{1,0,0}\ell}\times n_{2}},\,\mathbf{Y}^{(1)}\in\mathbb{K}^{n_{0}\times n_{1}\times{\color[rgb]{1,0,0}\ell}}\,\mbox{ and }\mathbf{Y}^{(2)}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times n_{1}\times n_{2}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the Bhattacharya-Mesner rank of third order hypermatrices

Edinah K. Gnang , Yuval Filmus Department of Applied Mathematics and Statistics, Johns Hopkins University, email: [email protected] Science Department, Technion - Israel Institute of Technology

Abstract

We introduce the Bhattacharya-Mesner rank of third order hypermatrices as a relaxation to the tensor rank and devise from it some bounds for the tensor rank. We use the Bhattacharya-Mesner rank to extend to third order hypermatrices the connection relating the rank to a notion of linear dependence. We also derive explicit necessary and sufficient conditions for the existence of third order hypermatrix inverse pair. Finally we use inverse pair to extend to third order hypermatrices the formulation and proof of the matrix rank–nullity theorem.

1 Introduction

Hypermatrices are multidimensional analog of matrices. A hypermatrix is therefore a finite multiset whose elements ( called entries ) are indexed by distinct elements of some fixed Cartesian product of the form

[TABLE]

Such a hypermatrix is of order $m$ and of size $n_{0}\times n_{1}\times\cdots\times n_{m-1}$ . A hypermatrix is cubic, of side length $n$ if $n_{0}=n_{1}=\cdots=n_{m-1}=n$ . In particular, matrices correspond to second order hypermatrices. Hypermatrix algebras arise from generalizations of classical matrix notions and algorithms [MB94, GKZ94, Ker08, GER11, MB90, GNANG2017238]. The distinction between hypermatrices and tensors closely mirrors the distinction between matrices and abstract linear transformations. Recall that an abstract linear transformation defined over a finite dimensional vector space is identified with a matrix orbit, not just a single matrix. The orbit correspond to the different choices of bases for the domain and range of the linear transformation. For instance, let $\mathbf{M}\in\mathbb{K}^{m\times n}$ denote the matrix associated with some abstract linear transformation relative to the standard basis for $\mathbb{K}^{m\times 1}$ and $\mathbb{K}^{1\times n}$ . Then the tensorial orbit associated with the corresponding abstract linear transformation is defined to be the set of matrices given by

[TABLE]

A matrix property common to all the members of the same tensorial orbit is called a tensorial invariant. Classical matrix attributes well known to be tensorial invariants when $\mathbb{K}=\mathbb{C}$ are:

•

The rank defined to be

[TABLE]

•

The nullity defined to be the dimension of the nullspaces of $\mathbf{A}$ and well known to be given by

[TABLE]

•

Singular values defined to be the modulus of numbers in the multiset formed by taking diagonal entries of any diagonal matrix element of the tensorial sub-orbit restricted to left and right action by the unitary subgroup of the general linear groups described as follows:

[TABLE]

•

The eigenvalues of square diagonalizable matrices defined to be the multiset formed by taking the diagonal entries of any diagonal matrix elements of the tensorial sub-orbit restricted to the action by conjugation of the general linear group described as follows:

[TABLE]

Classically, hypermatrices such as third order hypermatrices from $\mathbb{K}^{m\times n\times p}$ also arise from tensorial orbits induced by the action of various appropriate subgroups of the general linear group on canonical embeddings of the vector spaces $\mathbb{K}^{m\times 1\times 1}$ , $\mathbb{K}^{1\times n\times 1}$ and $\mathbb{K}^{1\times 1\times p}$ respectively. Incidentally, the classical tensor rank and singular values are defined by analogy to their matrix counterparts. Unfortunately, the tensorial invariant approach to defining the eigenvalues does not extend to odd order hypermatrices because we cannot express the action by conjugation using a triplet of matrix elements from the general linear group. Fortunately the Bhattacharya-Mesner algebra suggest new tensorial hypermatrix orbit based upon higher order analog of the general linear group. Hypermatrix analog of general linear group allows us to extend to hypermatrices the nottion of a spectral decomposition [GER11, GNANG2017238]. We argue in the present note that the Bhattacharya-Mesner (BM for short) rank also preserves the link relating the rank to the nullity of third order hypermatrices when these notions are appropriately defined. Our discussion therefore focuses on the Bhattacharya-Mesner algebra first proposed in [MB90, MB94], and the general Bhattacharya-Mesner algebra first proposed in [GER11]. The general Bhattacharya-Mesner product encompasses as special cases many other hypermatrix products and decompositions discussed in the literature, including the usual matrix product, the Segre outer product, the contraction product, the higher order singular value decomposition, the multilinear matrix multiplication [Lim13], and the slice rank factorization [BCC*+*16]. Note that the BM algebra of third order hypermatrices in particular is also motivated by the study of linear systems of equations defined over a skew field ( such as the skew field of quaternions ) whose unknowns are multiplied both on the left and on the right by known coefficients from the skew field.

This article is accompanied by an extensive and actively maintained Sage [S*+*15] symbolic hypermatrix algebra package which implements various features of the general Bhattacharya-Mesner algebra. The package is made available at the link https://github.com/gnang/HypermatrixAlgebraPackage.

Acknowledgement: We would like to thank Andrei Gabrielov for providing guidance while preparing this manuscript. We are grateful to Dan Naiman for pointing out an upper bound on the diagonal dependence. The first author was supported by the National Science Foundation grant DMS–1161629, and is grateful for the hospitality of the Institute for Advanced Study and the Department of Mathematics at Purdue University.

2 Overview of the Bhattacharya-Mesner algebra of third order hypermatrices

The Bhattacharya-Mesner product, first proposed in [MB90, MB94], generalizes the classical matrix product. The Bhattacharya-Mesner product is best introduced by emphasizing its similarities with the matrix product. Recall that for conformable matrices with entries from some skew field $\mathbb{K}$

[TABLE]

we denote their product by $\mbox{Prod}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)}\right)\in\mathbb{K}^{n_{0}\times n_{1}}$ with entries given by

[TABLE]

Whereas the matrix product is a binary operation, the BM product of third order hypermatrices is a ternary operation. The product of third order conformable hypermatrices

[TABLE]

denoted Prod $\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)},\mathbf{A}^{(2)}\right)\in\mathbb{K}^{n_{0}\times n_{1}\times n_{2}}$ is specified entry-wise by

[TABLE]

In hindsight, the BM product of third order hypermatrices occurs naturally in descriptions of general linear systems of equations over skew fields. A general linear systems of equations over a skew field $\mathbb{K}$ , is one for which the unknowns are multiplied both on the left and on the right by coefficients from $\mathbb{K}$ . General linear system of equations are therefore of the form

[TABLE]

The Bhattacharya-Mesner product succinctly expresses such systems in terms of a left-coefficient hypermatrix $\mathbf{A}\in\mathbb{K}^{1\times{\color[rgb]{1,0,0}\ell}\times n_{2}}$ , an unknown vector $\mathbf{x}\in\mathbb{K}^{1\times 1\times{\color[rgb]{1,0,0}\ell}}$ , a right-coefficient hypermatrix $\mathbf{B}\in\mathbb{K}^{{\color[rgb]{1,0,0}\ell}\times 1\times n_{2}}$ and a right hand side vector $\mathbf{c}\in\mathbb{K}^{1\times 1\times n_{2}}$ by the equation

[TABLE]

Note that the left hand side of a general linear system of equations defined over skew fields as described in Eq. (1) can not generally be expressed as a single left action ( or right action ) of some coefficient matrix on an vector of unknowns. Special instances of general linear systems of equations can be solved using quasi-determinants [GR91, GRW05, GR97], but in general solutions to such systems are not expressible as non-commutative rational functions of the entries of $\mathbf{A}$ , $\mathbf{B}$ and $\mathbf{c}$ .

We further recall a variant of the Bhattacharya-Mesner product called the general Bhattacharya-Mesner product of particular interest to our discussion because it simplifies subsequent notations. The general product of third order hypermatrices is defined for any conformable triple

[TABLE]

taken with an additional cubic background hypermatrix $\mathbf{B}$ of side length ${\color[rgb]{1,0,0}\ell}$ ( the contracted dimension ). The general Bhattacharya-Mesner product of hypermatrices $\mathbf{A}^{(0)}$ , $\mathbf{A}^{(1)}$ and $\mathbf{A}^{(2)}$ taken with the background hypermatrix $\mathbf{B}$ , denoted $\mbox{Prod}_{\mathbf{B}}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)},\mathbf{A}^{(2)}\right)\in\mathbb{K}^{n_{0}\times n_{1}\times n_{2}}$ is specified entry-wise by

[TABLE]

The original Bhattacharya-Mesner product is recovered from the general product by setting the cubic background hypermatrix $\mathbf{B}$ to be equal to the Kronecker delta hypermatrix denoted $\boldsymbol{\Delta}$ , whose entries are specified by

[TABLE]

Finally we recall for the reader’s convenience the definition of the hypermatrix transpose operations. Let $\mathbf{A}\in\mathbb{C}^{n_{0}\times n_{1}\times n_{2}}$ , the corresponding transpose, denoted $\mathbf{A}^{\top}\in\mathbb{C}^{n_{1}\times n_{2}\times n_{0}}$ , is specified entry-wise by

[TABLE]

The transpose operation therefore performs a cyclic permutation to the indices of each entry. For notational convenience we adopt the convention

[TABLE]

By this convention,

[TABLE]

3 The Bhattacharya-Mesner rank

The Bhattacharya-Mesner* outer-product* corresponds to the special product instance where the the contracted dimension equals $1$ . Furthermore, the general Bhattacharya-Mesner product provides a convenient alternative way of expressing BM outer-products. Recall that for an arbitrary conformable triple

[TABLE]

an outer product corresponds to a product of selected oriented matrix slices

[TABLE]

Recall that in the colon notation, $\mathbf{A}^{(0)}\left[:,t,:\right]$ refers to a hypermatrix slice of size $n_{0}\times 1\times n_{2}$ , where the second index is fixed to $t$ while all other indices are allowed to vary within their ranges. The BM product and the usual matrix product have in common the fact that every product is a sum of outer products. Hypermatrix outer products are conveniently expressible as general BM products. The corresponding background hypermatrices are denoted $\left\{\boldsymbol{\Delta}^{(t)}\right\}_{0\leq t<n}$ and specified entry-wise by

[TABLE]

The outer product in Eq. ((4)) is thus equal to $\mbox{Prod}_{\boldsymbol{\Delta}^{(t)}}\left(\mathbf{A}^{(0)},\mathbf{A}^{(1)},\mathbf{A}^{(2)}\right)$ . The Bhattacharya-Mesner outer product suggests a natural generalization of the notion of rank which differs from the classical tensor rank. To emphasize the similarities with the matrix rank, recall that a matrix $\mathbf{A}\in\mathbb{K}^{n_{0}\times n_{1}}$ has rank $r$ ( over $\mathbb{K}$ ) if there exists a conformable matrix pair

[TABLE]

such that

[TABLE]

and crucially, for all conformable matrix pair

[TABLE]

we have

[TABLE]

where

[TABLE]

In other words, the matrix rank is the minimum number of outer products which add up to $\mathbf{A}$ . This approach to defining the rank extends verbatim to hypermatrices and is called the Bhattacharya-Mesner rank of a hypermatrix. Throughout our discussion unless otherwise specified the rank refers to the Bhattacharya-Mesner rank. A hypermatrix $\mathbf{A}\in\mathbb{K}^{n_{0}\times n_{1}\times n_{2}}$ has rank $r$ (over $\mathbb{K}$ ) if there exists a conformable triple

[TABLE]

such that

[TABLE]

and crucially for all BM conformable triple

[TABLE]

we have

[TABLE]

where

[TABLE]

In other words, the rank of $\mathbf{A}$ is the minimum number of outer products which add up to $\mathbf{A}$ . Note that the usual notions of tensor/hypermatrix rank discussed in the literature [Lim13], including the canonical polyadic rank, and more recently the slice rank [BCC*+*16] are all special instances of the Bhattacharya-Mesner rank where additional constraints are imposed on the hypermatrices in the conformable triple $\left(\mathbf{X}^{(0)},\mathbf{X}^{(1)},\mathbf{X}^{(2)}\right)$ in Eq. (5), as illustrated by the following proposition.

Proposition 1 :

[TABLE]

Proof : Recall the definition of the Kronecker product of vectors denoted $\left(\mathbf{x}\otimes\mathbf{y}\otimes\mathbf{z}\right)\in\mathbb{K}^{m\times n\times p}$ of vectors $\mathbf{x}\in\mathbb{K}^{m\times 1\times 1}$ , $\mathbf{y}\in\mathbb{K}^{1\times n\times 1}$ , $\mathbf{z}\in\mathbb{K}^{1\times 1\times p}$ associated with the canonical decomposition specified entry wise by

[TABLE]

We justify the upper bound claim by establishing that vector outer product above is a constrained Bhattacharya-Mesner outer product. Let $\mathbf{X}\in\mathbb{C}^{m\times 1\times p}$ , $\mathbf{Y}\in\mathbb{C}^{m\times n\times 1}$ and $\mathbf{Z}\in\mathbb{C}^{1\times n\times p}$ such that

[TABLE]

We see that

[TABLE]

The gap between the tensor rank and the Bhattacharya-Mesner rank is well illustrated by the fact that the hypermatrix

[TABLE]

has tensor rank $r$ while its Bhattacharya-Mesner rank is $1$ for all $0<r\leq n$ . So that the gap between the canonical polyadic rank and the Bhattacharya-Mesner rank can be as large as $n-1$ for some $n\times n\times n$ hypermatrix. A similar argument also establishes the slice rank discussed in [BCC*+*16] as an upper bound to the Bhattacharya-Mesner rank.

The following proposition emphasize the similarity of between the Bhattacharya-Mesner rank and the matrix rank.

Proposition 2 :

[TABLE]

The Bhattacharya-Mesner rank of $\mathbf{A}\in\mathbb{C}^{m\times n\times p}$ is upper bounded by min $\left(m,n,p\right)$ .

Proof : Let $\mathbf{A}\in\mathbb{K}^{m\times n\times p}$ where $p\leq\min\left\{m,n\right\}$ . Let $\mathbf{J}_{0}\in\mathbb{K}^{m\times p\times p}$ and $\mathbf{J}_{1}\in\mathbb{C}^{p\times n\times p}$ with entries specified by

[TABLE]

consequently we have

[TABLE]

which establishes that the rank of $\mathbf{A}$ must be less or equal to $p$ . Recall the transpose identity

[TABLE]

and

[TABLE]

from which the desired result follows

3.1 Rank inequalities

We introduce here the notion of matrix diagonal independence which is closely related to the rank of third order hypermatrices. Let

[TABLE]

be given such that $\ell$ $<$ min $\left(m,n,p\right)$ and

[TABLE]

The set of hypermatrices

[TABLE]

is determined by solving for the matrix depth slices of $\mathbf{X}$ in the general linear system

[TABLE]

Properties of such general linear constraints motivate our definition and subsequent investigations of matrix left-right linear diagonal dependencies. A given multiset of matrices

[TABLE]

are left-right diagonally independent over the skew field $\mathbb{K}$ if the only diagonal matrices

[TABLE]

for which the equality

[TABLE]

holds, must be such that

[TABLE]

Note that over $\mathbb{C}$ , when each matrix in the set $\left\{\mathbf{M}_{t}\right\}_{0\leq t<p}$ has a unique non-zero entry, the left-right diagonal dependence reduces to the usal notion linear dependence. For such matrices the tight upper bound for the maximum number of diagonally independent matrices is $m\cdot n$ . Otherwise, when every column ( or respectively every row ) of each of the matrices in the set $\left\{\mathbf{M}_{t}\right\}_{0\leq t<p}$ is a non-zero, the trivial upper bound on the maximum number of diagonally independent matrices reduces to min $\left(m,n\right)$ , for it suffices to consider left (or respectively right ) diagonal linear combination constraints of the form

[TABLE]

Theorem 3 : Let $\mathbf{H}\in\mathbb{C}^{m\times n\times p}$ be such that

[TABLE]

where $0<\ell<$ min $\left(m,n,p\right)$ . Let the depth matrix slices of $\mathbf{H}$ be

[TABLE]

then there exist diagonal matrices $\left\{\mbox{diag}\left(\mathbf{x}_{it}\right)\right\}_{i,t}\subset\mathbb{C}^{m\times m}$ as well as diagonal matrices $\left\{\mbox{diag}\left(\mathbf{y}_{tj}\right)\right\}_{t,j}\subset\mathbb{C}^{n\times n}$ such that

[TABLE]

for some choice of a matrix subset $\left\{\mathbf{M}_{t}\,:\,t\in\left\{0,\cdots,\ell\right\}\right\}\subset\left\{\mbox{Mat}\left(\mathbf{H}\left[:,:,k\right]\right)\,:\,t\in\left\{0,\cdots,p-1\right\}\right\}$

Proof : By the premise that rank $\left(\mathbf{H}\right)=\ell$ , it follows that there is a conformable triple

[TABLE]

such that

[TABLE]

Consequently,

[TABLE]

For fix $\mathbf{U}$ and $\mathbf{W}$ consider the equation in $\mathbf{X}$ of size $m\times n\times\ell$ given by

[TABLE]

In this form, the equation expresses a general linear system of equation whose coefficient hypermatrices are $\mathbf{A}\in\mathbb{C}^{1\times\ell\times p}$ , and $\mathbf{B}\in\mathbb{C}^{\ell\times 1\times p}$ , in an unknown vector $\mathbf{x}$ of size $1\times 1\times\ell$ , and the corresponding right-hand side vector $\mathbf{c}$ of size $1\times 1\times p$ . Using the BM product we re-express the corresponding general linear system of equations as follows

[TABLE]

The entries of the coefficient hypermatrices $\mathbf{A}$ and $\mathbf{B}$ in Eq. (6) are each diagonal matrices of respective size $m\times m$ and $n\times n$

[TABLE]

The entries of $\mathbf{x}$ are unknown $m\times n$ matrices and the entries of $\mathbf{c}$ correspond to $m\times n$ depth matrix slices of $\mathbf{H}$ specified by

[TABLE]

Since $\mathbf{H}$ has rank exactly $\ell$ , for every $k\in\left\{0,\cdots,p-1\right\}$ there exists $t\in\left\{0,\cdots,\ell-1\right\}$ such that

[TABLE]

We perform on the system

[TABLE]

a minor variant of the Gaussian elimination procedure which avoids division. The procedure is best illustrated by describing the first round of elimination for a generic system. The first sequence of row operation is described by

[TABLE]

Although, the very first constraint denoted R0 is left unchanged by the first sequence of row operations, we rewrite R0 for consistency of the unknown variables as follows

[TABLE]

Following the first sequence of row operations the variable $\mathbf{x}\left[0,0,0\right]$ is eliminated from all but the constraints R0 via Eq. (7) and yields new constraints of the form

[TABLE]

Equivalently re-expressed as

[TABLE]

Note that each $\mathbf{U}_{k}$ denotes an $m\times n$ free variable matrix. The free variables arise from the fact that given an equality of the form $y_{0}+y_{1}=a+b$ , where $y_{0}$ and $y_{1}$ are unknowns and $a$ and $b$ are known constants, the equality $y_{0}+y_{1}=a+b$ is equivalent to the assertion $y_{1}=a+t$ in conjunction with the assertion that $y_{1}=b-t$ for any choice of the free parameter $t$ . The argument is the same for the $\mathbf{U}_{k}$ s. The procedure is thus repeated until the constraints are put in Row Echelon Form (REF for short). Although the system is non-commutative the procedure avoids division because conformable diagonal matrices commute.

Having described the proposed variant of the Gaussian elimination procedure, we discuss properties of the obtained REF. Throughout the procedure, the right-hand sides are always left-right diagonal linear combinations of the previously obtained right hand side entries. Since there are more constraints then variables (because $\ell<p$ ) the procedure must result in at least for $\left(p-\ell\right)$ constraints in the REF whose left hand side express identically zero functions of the variable entries of $\mathbf{x}$ . However the right hand side need not be an identically zero function of the the free variables. Using the fact

[TABLE]

are solution to the system it follows that there must exist an assignment to the free variables which annihilates the right hand side of these $\left(p-\ell\right)$ constraints.

The right hand side expresses in the generic case a left-right diagonal linear combination of the form

[TABLE]

Thus completing the proof.

Note that the variant of Gaussian elimination described in the proof of Thm. 3 does not fully express solutions to the general linear system, for two reasons. The first reason is that subsequent row operations introduce constraints on the previously chosen free variables which must subsequently solved in order to fully express the solutions to the system. The second reason result from the fact that diagonal left and right coefficient matrices of unknowns obtained throughout the procedure need not be invertible. Subsequent results presented in this note attempt to address these limitations.

3.2 Towards hypermatrix rank revealing factorizations

We emphasize the analogy with the matrix case by first recalling the property of Gaussian elimination which is central to the LU rank revealing factorization.

Theorem 4 : Let $\mathbf{X}\in\mathbb{C}^{m\times\ell}$ and $\mathbf{Y}\in\mathbb{C}^{\ell\times n}$ be matrices such that for some scalars $\left\{u_{t}\right\}_{0\leq t\neq\tau<\ell}\subset\mathbb{C}$ there exist an index $\tau$ such that,

[TABLE]

then $\mbox{Prod}\left(\mathbf{X},\mathbf{Y}\right)$ has rank at most $\left(\ell-1\right)$ .

*Proof *: Our proof assumes without loss of generality that $\tau=\left(\ell-1\right)$ . The argument is exactly the same for any other choice of $0\leq\tau<\ell$ . The product of $\mathbf{X}\in\mathbb{C}^{m\times\ell}$ and $\mathbf{Y}\in\mathbb{C}^{\ell\times n}$ is a sum of $\ell$ outer products. In particular, the outer product of the last column of $\mathbf{X}$ with the last row of $\mathbf{Y}$ is given by

[TABLE]

By our hypothesis the row space of $\mathbf{Y}$ has dimension at most $\left(\ell-1\right)$ , and more explicitly we have

[TABLE]

Given the assumption, we express Prod $\left(\mathbf{X},\mathbf{Y}\right)$ as a smaller sum of outer products as follows:

[TABLE]

and hence

[TABLE]

8 expresses the well known elementary column linear combination operations

[TABLE]

We now extend the argument above to third order hypermatrices.

Theorem 5 : Let the conformable triple

[TABLE]

be such that for some index $\tau$ the following holds:

[TABLE]

then $\mbox{Prod}\left(\mathbf{X},\mathbf{Y},\mathbf{Z}\right)$ has rank at most $\left(\ell-1\right)$ .

*Proof *: Our proof assumes without loss of generality that $\tau=\left(\ell-1\right)$ . The argument is exactly the same for any choice $0\leq\tau<\ell$ . The product of the conformable triple

[TABLE]

expresses a sum of $\ell$ outer products. In particular, consider the outer product of the last column slice of $\mathbf{X}$ with the last depth slice of $\mathbf{Y}$ and the last row slice of $\mathbf{Z}$ given by

[TABLE]

By analogy to the matrix case, an assumption is made on this outer product

[TABLE]

The explicit assumption is that

[TABLE]

Given our assumption we express Prod $\left(\mathbf{X},\mathbf{Y},\mathbf{Z}\right)$ as a smaller sum of outer products as follows:

[TABLE]

hence

[TABLE]

Thus completing our proof.

The condition described in Thm. (4) for the matrix case and Thm. (5) for the hypermatrix case are sufficient but clearly not necessary. To derive from Thm. (4), a necessary condition for the rank of the matrix $\mathbf{Y}\in\mathbb{C}^{\ell\times n}$ to be less than $\ell$ , it suffices to the set $\mathbf{X}$ to the identity matrix $\mathbf{I}_{\ell}$ . Similarly, we derive from Thm. (5) a necessary condition for the rank of the hypermatrix $\mathbf{Y}\in\mathbb{C}^{m\times\ell\times p}$ to be rank less then $\ell$ , it suffices to set the hypermatrices $\mathbf{X}$ and $\mathbf{Y}$ to be the identity pair $\mathbf{J}_{0}\in\mathbb{C}^{m\times p\times p}$ and $\mathbf{J}_{1}\in\mathbb{C}^{p\times n\times p}$ whose entries are specified by

[TABLE]

For which we recall the defining property being that

[TABLE]

where we assume that $n=$ min $\left(m,\,n,\,p\right)$ . And hence, the condition in Eq. (9) therefore reduces to left and right diagonal dependence of the depth slices.

The rank revealing factorizations of Prod $\left(\mathbf{X},\mathbf{Y},\mathbf{Z}\right)$ is obtained by repeatedly solving for entries of $\left\{\mbox{diag}\left(\mathbf{u}_{t}\right),\,\mbox{diag}\left(\mathbf{v}_{t}\right)\right\}_{0\leq t<\ell-1}$ in Eq. (9). At each iteration the assignment of $\left\{\mbox{diag}\left(\mathbf{u}_{t}\right),\,\mbox{diag}\left(\mathbf{v}_{t}\right)\right\}_{0\leq t<\ell-1}$ effectively reduces by one the number of outer product summands. The corresponding elementary slice operation is thus prescribed by

[TABLE]

Unfortunately the constraints in Eq. (9) are non-linear constraints.

Theorem 6 : If a generic $\mathbf{B}\in\mathbb{C}^{m\times n\times\left(r+1\right)}$ , such that $\mathbf{B}\left[:,:,0\right]\in\left(\mathbb{C}-\left\{0\right\}\right)^{m\times n\times 1}$ and $r<\min\left(m,n\right)$ , and then

[TABLE]

Proof : It suffices to focus on the case

[TABLE]

It follows from Theorem 5 that the outer product reduction criterion for the product of a conformable triple of hypermatrices

[TABLE]

is given by

[TABLE]

where $\mathbf{A}=\mathbf{J}_{0}\in\mathbb{C}^{m\times\left(r+1\right)\times\left(r+1\right)}$ and $\mathbf{C}=\mathbf{J}_{1}\in\mathbb{C}^{\left(r+1\right)\times n\times\left(r+1\right)}$ whose entries are specified by

[TABLE]

we have

[TABLE]

and the equality

[TABLE]

simplifies to the following diagonal dependence relation

[TABLE]

which we re-express in terms of columns and rows of $\mathbf{U}\in\mathbb{C}^{m\times p}$ and $\mathbf{V}^{p\times n}$ respectively as

[TABLE]

where $\mathbf{X}^{\prime}\in\mathbb{C}^{m\times\left(r+1\right)\times 1}$ and $\mathbf{Y}^{\prime}\in\mathbb{C}^{\left(r+1\right)\times n\times 1}$ , Let $\tau=0$ then we have

[TABLE]

we rewrite the constraints in Eq. (10) as

[TABLE]

Hence the constraints in Eq. (10) can be re-expressed in terms of smaller hypermatrices $\mathbf{X}\in\mathbb{C}^{m\times r\times 1}$ and $\mathbf{Y}\in\mathbb{C}^{r\times n\times 1}$ as

[TABLE]

We further cast these constraints as a system of linear equations of the form

[TABLE]

where

[TABLE]

These algebraic relations effectively eliminate the variables $\left\{\mathbf{X}\left[u,0,0\right]\right\}_{0\leq u<m}$ as well as the variables $\left\{\mathbf{Y}\left[0,v,0\right]\right\}_{0\leq v<n}$ from the system. Note that the monomials $\left\{\mathbf{X}\left[u,0,0\right]\mathbf{Y}\left[0,v,0\right]\right\}_{\begin{array}[]{c}0\leq u<m\\ 0\leq v<n\end{array}}$ correspond to entries of an $m\times n$ rank one matrix. The problem is thus reduced to a system of ${m\choose 2}\cdot{n\choose 2}$ constraints in the remaining $\left(m+n\right)\cdot\left(r-1\right)$ variable entries of $\mathbf{X}$ and $\mathbf{Y}$ . These constraints are determinantal constraints of the form

[TABLE]

The system above is of type 2 as described in [GG18] in the $mn$ variables

[TABLE]

In particular the exponent matrix of the corresponding REF of the system has $\left(m-1\right)\left(n-1\right)$ pivot ( accounting for the degrees of freedom of the rank 1 matrix ) variables, $m+n-1$ free variables and ${m\choose 2}{n\choose 2}-\left(m-1\right)\cdot\left(n-1\right)$ zero rows. By the algebraic independence of the pivot rows it follows that $\mathbf{A}$ stands a chance to have rank $r+1$ if $\left(m-1\right)\cdot\left(n-1\right)$ exceeds the number of remaining variables. The condition is thus expressed by the inequality

[TABLE]

Theorem 7 : The Bhattacharya-Mesner rank of a generic 111A generic hypermatrix is one whose entries do not satisfy any non-trivial algebraic relation. In particular, all of its entries are non-zero. $\mathbf{B}\in\mathbb{C}^{n\times n\times n}$ is at most $2$ if $n=2$ and at most $\left(n-1\right)$ if $n>2$ .

*Proof *: It follows from Theorem 5 that the outer product reduction criterion for the product of a conformable triple of hypermatrices

[TABLE]

where $p=$ min $\left(m,n\right)$ is given by

[TABLE]

where $\mathbf{A}=\mathbf{J}_{0}\in\mathbb{C}^{m\times p\times p}$ and $\mathbf{C}=\mathbf{J}_{1}\in\mathbb{C}^{p\times n\times p}$ whose entries are specified by

[TABLE]

we have

[TABLE]

and the equality

[TABLE]

simplifies to the following diagonal dependence relation

[TABLE]

which we re-express in terms of columns and rows of $\mathbf{U}\in\mathbb{C}^{m\times p}$ and $\mathbf{V}^{p\times n}$ respectively as The diagonal dependence is therefore expressed by constraints of the form

[TABLE]

Consequently for all $1\leq k<n$ we have

[TABLE]

Here $\circ$ denotes the entry-wise product also called the Hadamard product, and $\circ^{k}$ denotes the entry-wise exponentiation by $k$ of non-zero entries. The diagonal dependence status of the depth slices of $\mathbf{B}$ does not change if each depth slice of $\mathbf{B}$ is premultiplied on the right by distinct known invertible diagonal matrices. As a result, we can assume without loss of generality that det $\left(\mathbf{V}\right)\neq 0$ and $n\leq m$ , in which we case we have

[TABLE]

Let $\mathbf{M}\in\left(\mathbb{C}\left[v_{00},\cdots,v_{n-1\,n-1}\right]\right)^{mp\times mp}$ whose entries are given by

[TABLE]

The constraints in Eq. (13) express an eigenvector/eigenvalue problem for the $m^{2}\times m^{2}$ matrix $\mathbf{M}$ , whose entries are polynomials in the entries of $\mathbf{V}$ . The desired eigenvalue is det $\left(\mathbf{V}\right)$ while the entries of $\mathbf{U}$ make up the entries of the eigenvector. The constraints are of the form

[TABLE]

In order to find the eigenvalues, we solve for the characteristic polynomial equation

[TABLE]

For generic $\mathbf{B}$ the polynomial $\det\left\{\det\left(\mathbf{V}\right)\mathbf{I}_{mp}-\mathbf{M}\right\}\in\mathbb{C}\left[v_{00},\cdots,v_{p-1\,p-1}\right]$ is not an identically constant polynomial, for if this were the case then the entries of $\mathbf{B}$ would satisfy some non-trivial algebraic relation. Consequently the polynomial det $\left\{\det\left(\mathbf{V}\right)\mathbf{I}_{mp}-\mathbf{M}\right\}$ admits roots in $\mathbb{C}$ . Having assumed that det $\left(\mathbf{V}\right)\neq 0$ and that the entries of $\mathbf{B}$ are generic, the constraints reduce to a system of equations of the form

[TABLE]

where

[TABLE]

Each constraint therefore translates into constraints of the form

[TABLE]

where lex denotes the bijective map from the symmetric group Sn to $\left\{0,\cdots,n!-1\right\}$ which reflects the lexicographical ordering of the permutation strings. Note that in the case $n=2$ we have

[TABLE]

for some free variable $c_{1-\text{id}}$

[TABLE]

In order to satisfy Eq. (12) we may assume without lost of generality that $t_{j}=1=s_{j}$ , for all $0\leq j<m$ . Hence we have

[TABLE]

Generically, the solution to system of equations

[TABLE]

for distinct values of $j$ are distinct. Taking the solution for some fixed index $j$ determine the entries of $\mathbf{V}$ and in turn determine the entries of $\mathbf{M}$ in terms of the unknowns $\left\{c_{j,1-\text{id}}\right\}_{j}$ . The requirement that

[TABLE]

imposes additional constraints on the unknown $c_{j,1-\text{id}}$ which are generically inconsistent when $n=2$ . If we take $m$ to also be equal to $2$ then to resolve the consistency issue, it is necessary and sufficient to require the solution $\mathbf{V}$ to be the same for all index $j\in\left\{0,1\right\}$ . This of course is far from the generic setting and yields the constraints

[TABLE]

Assuming that

[TABLE]

and

[TABLE]

then the system reduces to

[TABLE]

which we recognize at once as hyperdeterminant of the $2\times 2\times 2$ as introduced in [GNANG2017238]. However in the generic setting, when $n=m>2$ the derivation results in more variables then there are constraints since $n!>n$ whenever $n>2$ , therefore generically the constraints admit solutions from which the desired result follows.

3.3 Necessary and sufficient conditions for the existence of inverse

pairs

Inverse pairs, introduced in [MB90, MB94], extend to hypermatrices the notion of matrix inverse and enable us to describe a hypermatrix analog of the general linear group as follows

[TABLE]

By properties of the transpose it follows that

[TABLE]

such that

[TABLE]

Note that in Eq. (14) the hypermatrix pair $\left(\mathbf{A},\mathbf{B}\right)$ is the inner-inverse pair to of the pair $\left(\mathbf{C},\mathbf{D}\right)$ . The simplest illustration of a family of invertible pairs of hypermatrices is associated with third order hypermatrix analog of the subgroup of scaling matrices over a skew field $\mathbb{K}$ . $\mathbf{A}\in\mathbb{K}^{m\times p\times p}$ and $\mathbf{B}\in\mathbb{K}^{p\times n\times p}$ having entries specified by

[TABLE]

Consequently, the entries of the corresponding outer-inverse pair $\left(\mathbf{C},\mathbf{D}\right)$ has entries specified by

[TABLE]

We see that by analogy to the the matrix case the hypermatrix scaling action forms a group over any skew field $\mathbb{K}$ . Invertible scaling hyprematrices therefore correspond to the largest subset of invertible hypermatrix pairs whose inverse are obtained by inverting the non-zero entries in the given pair. Note that in general the third order analog of the general linear group does not form a group. Moreover entry-wise Eq. (14) yields the constraints

[TABLE]

By substituting

[TABLE]

into (15) we get

[TABLE]

Over skew fields it is not at all clear how to solve in general such systems. However if we are instead working over commutative algebraically closed field such as $\mathbb{C}$ , We may regroup and reorder the factors in the summands as follows

[TABLE]

Consequently, the inverse pair relation (14) asserts that

[TABLE]

This in effect flattens the inverse pair relation to finding elements of the matrix General linear group GL ${}_{m\cdot n\cdot p}\left(\mathbb{C}\right)$ . Let $\mathbf{F}=\mathcal{F}\left(\mathbf{A},\mathbf{B}\right)\in$ GL ${}_{m\cdot n\cdot p}\left(\mathbb{C}\right)$ expressed in terms of the entries of $\mathbf{A}$ and $\mathbf{B}$ as follows:

[TABLE]

Note that $\mathbf{F}$ is a direct sum of $m\cdot n$ elements of GL ${}_{p^{2}}\left(\mathbb{C}\right)$ . Assuming the pair $\mathbf{A}\in\mathbb{C}^{m\times p\times p}$ , $\mathbf{B}\in\mathbb{C}^{p\times n\times p}$ given, we determine the corresponding inverse pair $\mathbf{C}\in\mathbb{C}^{m\times p\times p}$ , $\mathbf{D}\in\mathbb{C}^{p\times n\times p}$ . Given that we know the entries of $\mathbf{F}$ in terms of the entries of $\mathbf{A}$ and $\mathbf{B}$ , it follows that

[TABLE]

Consequently the entries of $\mathbf{C}$ and $\mathbf{D}$ are determined by solving system type 2 as described in [GG18] given by

[TABLE]

Therefore the two necessary and sufficient conditions which guarantee that the hypermatrix pair $\mathbf{A}\in\mathbb{C}^{m\times p\times p}$ , $\mathbf{B}\in\mathbb{C}^{p\times n\times p}$ admit an outer inverse pair $\mathbf{C}\in\mathbb{C}^{m\times p\times p}$ , $\mathbf{D}\in\mathbb{C}^{p\times n\times p}$ are that det $\left(\mathbf{F}\right)\neq 0$ and that $\mathbf{F}^{-1}$ lies in the log linear column space of the exponent matrix of the system $\mathcal{S}$ as discussed in [GG18].

3.4 Third order hypermatrix rank–nullity theorem

The third order hypermatrix variant of the rank–nullity theorem generalizes the well-known rank–nullity theorem of linear algebra [Str93]. For the sake of completeness, we recall the statement and a proof of the rank–nullity theorem. We subsequently extend the argument to third order hypermatrices.

Theorem 8 : $\mathbf{A}\in\mathbb{C}^{m\times n}$ ( where $r\leq n=$ min $\left(m,n\right)$ ) has nullity $\left(n-r\right)$ iff Rank $\left(\mathbf{A}\right)=r$ .

Proof of sufficiency : It follows from the hypothesis that there exist an invertible matrix $\mathbf{X}$ of size $n\times n$ , such that the last $\left(n-r\right)$ columns of $\mathbf{A}\mathbf{X}$ are zero columns. Hence

[TABLE]

thereby expressing $\mathbf{A}$ as a sum of $r$ outer products.**

**

Proof of necessity : Assuming that Rank $\left(\mathbf{A}\right)=r$ , we exhibit $n-r$ columns of an invertible matrix which are mapped to zero by $\mathbf{A}$

[TABLE]

For some $\mathbf{U}\in\mathbb{C}^{m\times n}$ and $\mathbf{V}\in\mathbb{C}^{n\times n}$ . Let us replace by zero columns the columns of $\mathbf{U}$ indexed by $\left\{0,\cdots,m-1\right\}-S$ and also replace by zero rows the rows of $\mathbf{V}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ . This substitution leaves the identity (17) unchanged. Note that the rows of $\mathbf{V}$ must be linearly independent for if this was not case by Theorem 6 the Rank $\left(\mathbf{A}\right)$ would be less then $r$ . Consequently, we may replace the rows of $\mathbf{V}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ by an arbitrary basis of the orthogonal complement of the space spanned by the rows of $\mathbf{V}$ indexed by $S$ . Such a basis can always be taken from some particular subset of rows of the identity matrix. So long as the columns of $\mathbf{U}$ indexed by $\left\{0,\cdots,m-1\right\}-S$ remain zero columns the row substitutions in $\mathbf{V}$ leave the identity (17) unchanged. Consequently det $\left(\mathbf{V}\right)\neq 0$ and the columns of $\mathbf{A}\cdot\mathbf{V}^{-1}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ are mapped by $\mathbf{A}$ to the zero columns thus concluding our proof.

The nullity of a hypermatrix $\mathbf{A}\in\mathbb{C}^{m\times n\times p}$ is defined to be the maximum number distinct pairs of depth slices of an invertible pair which are mapped zero slices by the action of an invertible pair of hypermatrices on an input hypermatrix $\mathbf{A}$ .

Theorem 9 : $\mathbf{A}\in\mathbb{C}^{m\times n\times p}$ (where $r\leq p=$ min $\left\{m,n,p\right\}$ ) has nullity $\left(p-r\right)$ iff Rank $\left(\mathbf{A}\right)=r$ .

Proof sufficiency : It follows from the hypothesis that there exists an invertible pair of hypermatrices $\mathbf{X}_{0}$ , $\mathbf{X}_{1}$ , respectively of size $m\times p\times p$ and $p\times n\times p$ , such that the last $\left(p-r\right)$ depth slices of $\mbox{Prod}\left(\mathbf{X}_{0},\mathbf{A},\mathbf{X}_{1}\right)$ are zero depth slices.

Let $\left(\mathbf{Y}_{0},\mathbf{Y}_{1}\right)$ , respectively of size $m\times p\times p$ and $p\times n\times p$ denote the outer-inverse pair of $\left(\mathbf{X}_{0},\mathbf{X}_{1}\right)$ , hence

[TABLE]

thereby expressing $\mathbf{A}$ as a sum of $r$ outer products.

Proof of necessity : Assuming that Rank $\left(\mathbf{A}\right)=r$ , we exhibit $\left(p-r\right)$ pairs of depth slices of an invertible hypermatrix pair which are mapped to zero by $\mathbf{A}$

[TABLE]

For some $\mathbf{U}\in\mathbb{C}^{m\times p\times p}$ , $\mathbf{V}\in\mathbb{C}^{m\times n\times p}$ and $\mathbf{W}\in\mathbb{C}^{p\times n\times p}$ . Let us replace by zero column slices the column slices of $\mathbf{U}$ indexed by $\left\{0,\cdots,m-1\right\}-S$ , also replace by zero depth slices of $\mathbf{V}$ the depth slices of $\mathbf{V}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ and finally replacing by zero row slices of $\mathbf{W}$ the row slices of $\mathbf{W}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ . This substitution leaves the identity ( 18 ) unchanged. Note that the depth slices of $\mathbf{V}$ do not satisfy the assumptions of Theorem 7 otherwise by Theorem 7 the Rank $\left(\mathbf{A}\right)$ would be less then $r$ . Consequently, we may replace the column slices of $\mathbf{U}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ and also replace the row slices of $\mathbf{W}$ indexed by $\left\{0,\cdots,n-1\right\}-S$ so as to ensure that $\mathbf{U}$ , $\mathbf{V}$ form an invertible pair. So long as the depth slices of $\mathbf{V}$ indexed by $\left\{0,\cdots,m-1\right\}-S$ remain zero slices, the substitutions into $\mathbf{U}$ and $\mathbf{V}$ leave the identity ( 18 ) unchanged. The depth slices of Prod $\left(\mathbf{U}^{-1},\mathbf{A},\mathbf{W}^{-1}\right)$ indexed by $\left\{0,\cdots,n-1\right\}-S$ are mapped by $\mathbf{A}$ to the zero depth slices, thus concluding our proof.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BCC + 16] J. Blasiak, T. Church, H. Cohn, J. A. Grochow, E. Naslund, W. F. Sawin, and C. Umans, On cap sets and the group-theoretic approach to matrix multiplication , Ar Xiv e-prints (2016).
2[GER 11] E. K. Gnang, A. Elgammal, and V. Retakh, A spectral theory for tensors , Annales de la faculte des sciences de Toulouse Mathematiques 20 (2011), no. 4, 801–841.
3[GF 17a] E. K. Gnang and Y. Filmus, On the Bhattacharya-Mesner rank of third order hypermatrices , Ar Xiv e-prints (2017).
4[GF 17b] Edinah K. Gnang and Yuval Filmus, On the spectra of hypermatrix direct sum and kronecker products constructions , Linear Algebra and its Applications 519 (2017), 238 – 277.
5[GG 18] E. K. Gnang and J. S. Gnang, Sketch for a Theory of Constructs , Ar Xiv e-prints (2018).
6[GKZ 94] I. Gelfand, M. Kapranov, and A. Zelevinsky, Discriminants, resultants and multidimensional determinant , Birkhauser, Boston, 1994.
7[GR 91] I. M. Gelfand and V. S. Retakh, Determinants of matrices over noncommutative rings , Functional Analysis and Its Applications 25 (1991), no. 2, 91–102.
8[GR 97] I. Gelfand and V. Retakh, Quasideterminants, i , Selecta Mathematica 3 (1997), no. 4, 517–546.