Introducing Hypergraph Signal Processing: Theoretical Foundation and   Practical Applications

Songyang Zhang; Zhi Ding; and Shuguang Cui

arXiv:1907.09203·eess.SP·June 5, 2020·IEEE Internet Things J.

Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

Songyang Zhang, Zhi Ding, and Shuguang Cui

PDF

TL;DR

This paper introduces hypergraph signal processing (HGSP), a tensor-based framework that generalizes graph signal processing to model high-order relationships, with theoretical foundations and practical applications demonstrating improved performance.

Contribution

The paper develops the theoretical foundation of HGSP, including hypergraph Fourier space, spectrum properties, sampling theory, and filter design, extending GSP to high-order data interactions.

Findings

01

HGSP outperforms traditional methods in experimental tests.

02

Hypergraph Fourier transform captures high-order relationships effectively.

03

The framework enables advanced signal processing in IoT and complex data scenarios.

Abstract

Signal processing over graphs has recently attracted significant attentions for dealing with structured data. Normal graphs, however, only model pairwise relationships between nodes and are not effective in representing and capturing some high-order relationships of data samples, which are common in many applications such as Internet of Things (IoT). In this work, we propose a new framework of hypergraph signal processing (HGSP) based on tensor representation to generalize the traditional graph signal processing (GSP) to tackle high-order interactions. We introduce the core concepts of HGSP and define the hypergraph Fourier space. We then study the spectrum properties of hypergraph Fourier transform and explain its connection to mainstream digital signal processing. We derive the novel hypergraph sampling theory and present the fundamentals of hypergraph filter design based on the…

Tables2

Table 1. TABLE I: Compression Ration of Different Methods

size	$16 \times 16$							$256 \times 256$
image	Radiation	People	load	inyang	stop	error	smile	lenna	mri	ct	AVG
IANH-HGSP	1.52	1.45	1.42	1.47	1.52	1.39	1.40	1.57	1.53	1.41	1.47
( $α, β$ )-GSP	1.37	1.23	1.10	1.26	1.14	1.16	1.28	1.07	1.11	1.07	1.18
4 connected-GSP	1.01	1.02	1.01	1.01	1.04	1.02	1.07	1.04	1.05	1.07	1.03

Table 2. TABLE II: MSE of Filtered Signal

$γ$	10e-5	10e-4	10e-3	10e-2	10e-1	1	10
Uniform Distribution: U(0, 0.1)
GSP	0.0031	0.0031	0.0031	0.0026	0.0017	0.0895	0.4523
HGSP	0.0031	0.0031	0.0028	0.0012	0.0631	0.1876	0.4083
Wiener	0.0201
Median	0.0142
Normal Distribution: N(0, 0.09)
GSP	0.790	0.790	0.0786	0.0556	0.0604	0.1286	0.4681
HGSP	0.0790	0.0585	0.0305	0.0778	0.1235	0.2374	0.4176
Wiener	0.0368
Median	0.0359
Normal Distribution: N(-0.02, 0.0001)
GSP	5.34e-04	5.36e-04	5.54e-04	7.76e-04	0.0055	0.1113	0.4650
HGSP	4.17e-04	4.72e-04	4.86e-04	6.48e-04	0.0044	0.0868	0.3483
Wiener	0.0230
Median	0.0096

Equations163

s^{'} = F_{M} s .

s^{'} = F_{M} s .

F_{M} = 01 ⋮ 00 00 ⋱ 00 \dots \dots ⋱ ⋱ \dots 00 ⋱ 01 10 ⋮ 00 .

F_{M} = 01 ⋮ 00 00 ⋱ 00 \dots \dots ⋱ ⋱ \dots 00 ⋱ 01 10 ⋮ 00 .

F_{M} = V_{M}^{- 1} Λ V_{M} .

F_{M} = V_{M}^{- 1} Λ V_{M} .

\hat{s} = V_{M} s .

\hat{s} = V_{M} s .

t_{ij k} = t_{j ik} = t_{k ij} = t_{k j i} = t_{j ik} = t_{j k i} i, j, k = 1, \dots, I .

t_{ij k} = t_{j ik} = t_{k ij} = t_{k j i} = t_{j ik} = t_{j k i} i, j, k = 1, \dots, I .

w_{i_{1} ... i_{P} j_{1} ... j_{Q}} = u_{i_{1} ... i_{P}} \cdot v_{j_{1} ... j_{Q}} .

w_{i_{1} ... i_{P} j_{1} ... j_{Q}} = u_{i_{1} ... i_{P}} \cdot v_{j_{1} ... j_{Q}} .

T = a \circ b,

T = a \circ b,

S = a \circ b \circ c = T \circ c .

S = a \circ b \circ c = T \circ c .

w_{i_{1} i_{2} \dots i_{n - 1} j i_{n + 1} \dots i_{P}} = i_{n} = 1 \sum I_{n} u_{i_{1} \dots i_{P}} v_{j i_{n}},

w_{i_{1} i_{2} \dots i_{n - 1} j i_{n + 1} \dots i_{P}} = i_{n} = 1 \sum I_{n} u_{i_{1} \dots i_{P}} v_{j i_{n}},

U \otimes V

U \otimes V

U ⊙ V = [u_{1} \otimes v_{1} u_{2} \otimes v_{2} \dots u_{K} \otimes v_{K}] .

U ⊙ V = [u_{1} \otimes v_{1} u_{2} \otimes v_{2} \dots u_{K} \otimes v_{K}] .

U * V = u_{11} v_{11} u_{21} v_{21} ⋮ u_{P 1} v_{P 1} u_{12} v_{12} u_{22} v_{22} ⋮ u_{P 2} v_{P 2} \dots \dots ⋱ \dots u_{1 Q} v_{1 Q} u_{2 Q} v_{2 Q} ⋮ u_{P Q} v_{P Q} .

U * V = u_{11} v_{11} u_{21} v_{21} ⋮ u_{P 1} v_{P 1} u_{12} v_{12} u_{22} v_{22} ⋮ u_{P 2} v_{P 2} \dots \dots ⋱ \dots u_{1 Q} v_{1 Q} u_{2 Q} v_{2 Q} ⋮ u_{P Q} v_{P Q} .

T = r = 1 \sum R a_{r} \circ b_{r} \circ c_{r},

T = r = 1 \sum R a_{r} \circ b_{r} \circ c_{r},

T \approx r = 1 \sum R λ_{r} \cdot a_{r}^{(1)} \circ ... \circ a_{r}^{(M)},

T \approx r = 1 \sum R λ_{r} \cdot a_{r}^{(1)} \circ ... \circ a_{r}^{(M)},

A = (a_{i_{1} i_{2} \dots i_{M}}), 1 \leq i_{1}, i_{2}, \dots, i_{M} \leq N .

A = (a_{i_{1} i_{2} \dots i_{M}}), 1 \leq i_{1}, i_{2}, \dots, i_{M} \leq N .

a_{p_{1} \dots p_{M}} = c k_{1}, k_{2}, \dots, k_{c} \geq 1, \sum_{i = 1}^{c} k_{i} = M \sum \frac{M !}{k _{1} ! k _{2} ! ... k _{c} !}^{- 1} .

a_{p_{1} \dots p_{M}} = c k_{1}, k_{2}, \dots, k_{c} \geq 1, \sum_{i = 1}^{c} k_{i} = M \sum \frac{M !}{k _{1} ! k _{2} ! ... k _{c} !}^{- 1} .

d (v_{i}) = j_{1}, j_{2} \dots, j_{M - 1} = 1 \sum N a_{i j_{1} j_{2} \dots j_{M - 1}} .

d (v_{i}) = j_{1}, j_{2} \dots, j_{M - 1} = 1 \sum N a_{i j_{1} j_{2} \dots j_{M - 1}} .

L = D - A \in R^{M times N \times N \times ... \times N}

L = D - A \in R^{M times N \times N \times ... \times N}

s^{[M - 1]} = M-1 times s \circ ... \circ s,

s^{[M - 1]} = M-1 times s \circ ... \circ s,

s_{(1)} =

s_{(1)} =

(s_{(1)})_{i} = j_{1}, ..., j_{M - 1} = 1 \sum N f_{i j_{1} ... j_{M - 1}} s_{j_{1}} s_{j_{2}} ... s_{j_{M - 1}} .

(s_{(1)})_{i} = j_{1}, ..., j_{M - 1} = 1 \sum N f_{i j_{1} ... j_{M - 1}} s_{j_{1}} s_{j_{2}} ... s_{j_{M - 1}} .

s_{7} = f_{732} \times s_{2} s_{3} + f_{723} \times s_{2} s_{3} + f_{756} \times s_{5} s_{6} + f_{765} \times s_{5} s_{6},

s_{7} = f_{732} \times s_{2} s_{3} + f_{723} \times s_{2} s_{3} + f_{756} \times s_{5} s_{6} + f_{765} \times s_{5} s_{6},

F \approx r = 1 \sum R λ_{r} \cdot f_{r}^{(1)} \circ ... \circ f_{r}^{(M)},

F \approx r = 1 \sum R λ_{r} \cdot f_{r}^{(1)} \circ ... \circ f_{r}^{(M)},

F \approx r = 1 \sum R λ_{r} \cdot M times f_{r} \circ ... \circ f_{r} .

F \approx r = 1 \sum R λ_{r} \cdot M times f_{r} \circ ... \circ f_{r} .

s_{(1)}

s_{(1)}

= (r = 1 \sum N λ_{r} \cdot M times f_{r} \circ ... \circ f_{r}) (M-1 times s \circ ... \circ s)

= r = 1 \sum N λ_{r} f_{r} M-1 times < f_{r}, s > \dots < f_{r}, s >

= iHGFT and filter in Fourier space [f_{1} \dots f_{N}] λ_{1} ⋱ λ_{N} HGFT of the hypergraph signal (f_{1}^{T} s)^{M - 1} ⋮ (f_{N}^{T} s)^{M - 1},

\hat{s}

\hat{s}

= (f_{1}^{T} s)^{M - 1} ⋮ (f_{N}^{T} s)^{M - 1} .

Ff_{r}^{[M - 1]} = \sum λ_{i} f_{i} (f_{i}^{T} f_{r})^{M - 1} = λ_{r} f_{r} .

Ff_{r}^{[M - 1]} = \sum λ_{i} f_{i} (f_{i}^{T} f_{r})^{M - 1} = λ_{r} f_{r} .

\hat{s} = (f_{1}^{T} s)^{M - 1} ⋮ (f_{N}^{T} s)^{M - 1},

\hat{s} = (f_{1}^{T} s)^{M - 1} ⋮ (f_{N}^{T} s)^{M - 1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

Songyang Zhang, Zhi Ding, , and Shuguang Cui © 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Abstract

Signal processing over graphs has recently attracted significant attentions for dealing with structured data. Normal graphs, however, only model pairwise relationships between nodes and are not effective in representing and capturing some high-order relationships of data samples, which are common in many applications such as Internet of Things (IoT). In this work, we propose a new framework of hypergraph signal processing (HGSP) based on tensor representation to generalize the traditional graph signal processing (GSP) to tackle high-order interactions. We introduce the core concepts of HGSP and define the hypergraph Fourier space. We then study the spectrum properties of hypergraph Fourier transform and explain its connection to mainstream digital signal processing. We derive the novel hypergraph sampling theory and present the fundamentals of hypergraph filter design based on the tensor framework. We present HGSP-based methods for several signal processing and data analysis applications. Our experimental results demonstrate significant performance improvement using our HGSP framework over some traditional signal processing solutions.

Index Terms:

Hypergraph, tensor, data analysis, signal processing.

I Introduction

Graph theoretic tools have recently found broad applications in data science owing to their power to model complex relationships in large structured datasets [1]. Big data, such as those representing social network interactions, Internet of Things (IoT) intelligence, biological connections, mobility and traffic patterns, often exhibit complex structures that are challenging to many traditional tools [2]. Thankfully, graphs provide good models for many such datasets as well as the underlying complex relationships. A dataset with $N$ data points can be modeled as a graph of $N$ vertices, whose internal relationships can be captured by edges. For example, subscribing users in a communication or social network can be modeled as nodes while the physical interconnections or social relationships among users are represented as edges [3].

Taking advantage of graph models in characterizing complex data structures, graph signal processing (GSP) has emerged as an exciting and promising new tool for processing large datasets with complex structures. A typical application of GSP is in image processing, where image pixels are modeled as graph signals embedding in nodes while pairwise similarities between pixels are captured by edges [6]. By modeling images using graphs, tasks such as image segmentation can take advantage of graph partition and GSP filters. Another example of GSP applications is in processing data from sensor networks [5]. Based on graph models directly built over network structures, a graph Fourier space could be defined according to the eigenspace of a representing graph matrix such as the Laplacian or adjacency matrix to facilitate data processing operations such as denoising [7], filter banks [8] and compression [9].

Despite many demonstrated successes, the GSP defined over normal graphs also exhibits certain limitations. First, normal graphs cannot capture high-dimensional interactions describing multi-lateral relationships among multiple nodes, which are critical for many practical applications. Since each edge in a normal graph only models the pairwise interactions between two nodes, the traditional GSP can only deal with the pairwise relationships defined by such edges. In reality, however, complex relationships may exist among a cluster of nodes, for which the use of pairwise links between every two nodes cannot capture their multi-lateral interactions [16]. In biology, for example, a trait may be attributed to multiple interactive genes [17] shown in Fig. 1(a), such that a quadrilateral interaction is more informative and powerful here. Another example is the social network with online social communities called folksonomies, where trilateral interactions occur among users, resources, and annotations [15, 40]. Second, a normal graph can only capture a typical single-tier relationship with matrix representation. In complex systems and datasets, however, each node may have several traits such that there exist multiple tiers of interactions between two nodes. In a cyber-physical system, for example, each node usually contains two components, i.e., the physical component and the cyber component, for which there exist two tiers of connections between a pair of nodes. Generally, such multi-tier relationships can be modeled as multi-layer networks, where each layer represents one tier of interactions [11]. However, normal graphs cannot model the inter-layer interactions simply, and the corresponding matrix representations are unable to distinguish different tiers of relationships efficiently since they describe entries for all layers equivalently [10, 14]. Thus, the traditional GSP based on matrix analysis has far been unable to efficiently handle such complex relationships. Clearly, there is a need for a more general graph model and graph signal processing concept to remedy the aforementioned shortcomings faced with the traditional GSP.

To find a more general model for complex data structures, we venture into the area of high-dimensional graphs known as hypergraphs. The hypergraph theory is playing an increasingly important role in graph theory and data analysis, especially for analyzing high-dimensional data structures and interactions [18]. A hypergraph consists of nodes and hyperedges connecting more than two nodes [19]. As an example, Fig. 2(a) shows a hypergraph example with three hyperedges and seven nodes, whereas Fig. 2(b) provides a corresponding dataset modeled by this hypergraph. Indeed, a normal graph is a special case of a hypergraph, where each hyperedge degrades to a simple edge that only involves exactly two nodes.

Hypergraphs have found successes by generalizing normal graphs in many applications, such as clustering [39], classification [22], and prediction [23]. Moreover, a hypergraph is an alternative representation for a multi-layer network, and is useful when dealing with multi-tier relationships [12, 13]. Thus, a hypergraph is a natural extension of a normal graph in modeling signals of high-degree interactions. Presently, however, the literature provides little coverage on hypergraph signal processing (HGSP). The only known work [4] proposed a HGSP framework based on a special hypergraph called complexes. In this work [4], hypergraph signals are associated with each hyperedge, but its framework is limited to cell complexes, which cannot suitably model many real-world datasets and applications. Another shortcoming of the framework in [4] is the lack of detailed analysis and application examples to demonstrate its practicability. In addition, the attempt in [4] to extend some key concepts from the traditional GSP simply fails due to the difference in the basic setups between graph signals and hypergraph signals. In this work, we seek to establish a more general and practical HGSP framework, capable of handling arbitrary hypergraphs and naturally extending the traditional GSP concepts to handle the high-dimensional interactions. We will also provide real application examples to validate the effectiveness of the proposed framework.

Compared with the traditional GSP, a generalized HGSP faces several technical challenges. The first problem lies in the mathematical representation of hypergraphs. Developing an algebraic representation of a hypergraph is the foundation of HGSP. Currently there are two major approaches: matrix-based [32] and tensor-based [20]. The matrix-based method makes it hard to implement the hypergraph signal shifting while the tensor-based method is difficult to be understood conceptually. Another challenge is in defining signal shifting over the hyperedge. Signal shifting is easy to be defined as propagation along the link direction of a simple edge connecting two nodes in a regular graph. However, each hyperedge in hypergraphs involves more than two nodes. How to model signal interactions over a hyperedge requires careful considerations. Other challenges include the definition and interpretation of hypergraph frequency.

To address the aforementioned challenges and generalize the traditional GSP into a more general hypergraph tool to capture high dimension interactions, we propose a novel tensor-based HGSP framework in this paper. The main contributions in this work can be summarized as follows. Representing hypergraphs as tensors, we define a specific form of hypergraph signals and hypergraph signal shifting. We then provide an alternative definition of hypergraph Fourier space based on the orthogonal CANDECOMP/PARAFAC (CP) tensor decomposition, together with the corresponding hypergraph Fourier transform. To better interpret the hypergraph Fourier space, we analyze the resulting hypergraph frequency properties, including the concepts of frequency and bandlimited signals. Analogous to the traditional sampling theory, we derive the conditions and properties for perfect signal recovery from samples in HGSP. We also provide the theoretical foundation for the HGSP filter designs. Beyond these, we provide several application examples of the proposed HGSP framework:

We introduce a signal compression method based on the new sampling theory to show the effectiveness of HGSP in describing structured signals;

2)

We apply HGSP in spectral clustering to show how the HGSP spectrum space acts as a suitable spectrum for hypergraphs;

3)

We introduce a HGSP method for binary classification problems to demonstrate the practical application of HGSP in data analysis;

4)

We introduce a filtering approach for the denoising problem to further showcase the power of HGSP;

5)

Finally, we suggest several potential applicable background for HGSP, including Internet of Things (IoT), social network and nature language processing.

We compare the performance of HGSP-based methods with the traditional GSP-based methods and learning algorithms in all the above applications. All the features of HGSP make it an essential tool for IoT applications in the future.

We organize the rest of the paper as follows. Section II first summarizes the preliminaries of the traditional GSP, tensors, and hypergraphs. In Section III, we then introduce the core definitions of HGSP, including the hypergraph signal, the signal shifting and the hypergraph Fourier space, followed by the frequency interpretation and decription of existing works in Section IV. We present some useful HGSP-based results such as the sampling theory and filter design in Section V. With the proposed HGSP framework, we provide several potential applications of HGSP and demonstrate its effectiveness in Section VI, before presenting the final conclusions in Section VII.

II Preliminaries

II-A Overview of Graph Signal Processing

GSP is a recent tool used to analyze signals according to the graph models. Here, we briefly review the key relevant concepts of the traditional GSP [2, 1].

A dataset with $N$ data points can be modeled as a normal graph $\mathcal{G}(\mathcal{V,E})$ consisting of a set of $N$ nodes $\mathcal{V}=\{\mathbf{v}_{1},\cdots,\mathbf{v}_{N}\}$ and a set of edges $\mathcal{E}$ . Each node of the graph $\mathcal{G}$ is a data point, whereas the edges describe the pairwise interactions between nodes. A graph signal represents the data associated with a node. For a graph with $N$ nodes, there are $N$ graph signals, which are defined as a signal vector $\mathbf{s}=[s_{1}\quad s_{2}\quad...\quad s_{N}]^{\mathrm{T}}\in\mathbb{R}^{N}.$

Usually, such a graph could be either described by an adjacency matrix $\mathbf{A_{M}}\in\mathbb{R}^{N\times N}$ where each entry indicates a pairwise link (or an edge), or by a Laplacian matrix $\mathbf{L_{M}=D_{M}-A_{M}}$ where $\mathbf{D_{M}}\in\mathbb{R}^{N\times N}$ is the diagonal matrix of degrees. Both the Laplacian matrix and the adjacency matrix can fully represent the graph structure. For convenience, we use a general matrix $\mathbf{F_{M}}\in\mathbb{R}^{N\times N}$ to represent either of them. Note that, since the adjacency matrix is eligible in both directed and undirected graph, it is more common in the GSP literatures. Thus, the generalized GSP is based on the adjacency matrix [2] and the representing matrix refers to the adjacency matrix in this paper unless specified otherwise.

With the graph representation $\mathbf{F_{M}}$ and the signal vector $\mathbf{s}$ , the graph shifting is defined as

[TABLE]

Here, the matrix $\mathbf{F_{M}}$ could be interpreted as a graph filter whose functionality is to shift the signals along link directions. Taking the cyclic graph shown in Fig. 3 as an example, its adjacency matrix is a shifting matrix

[TABLE]

Typically, the shifted signal over the cyclic graph is calculated as $\mathbf{s^{\prime}=F_{M}s}=[s_{N}\quad s_{1}\quad\cdots\quad s_{N-1}]^{\mathrm{T}}$ , which shifts the signal at each node to its next node.

The graph spectrum space, also called the graph Fourier space, is defined based on the eigenspace of $\mathbf{F_{M}}$ . Assume that the eigen-decomposition of $\mathbf{F_{M}}$ is

[TABLE]

The frequency components are defined by the eigenvectors of $\mathbf{F_{M}}$ and the frequencies are defined with respect to eigenvalues. The corresponding graph Fourier transform is defined as

[TABLE]

With the definition of the graph Fourier space, the traditional signal processing and learning tasks, such as denoising [33] and classification [73], could be solved within the GSP framework. More details about the specific topics of GSP, such as the frequency analysis, filter design, and spectrum representation have been discussed in [5, 52, 84].

II-B Introduction of Hypergraph

We begin with the definition of hypergraph and its possible representations.

Definition 1 (Hypergraph).

A general hypergraph $\mathcal{H}$ is a pair $\mathcal{H}=(\mathcal{V,E})$ , where $\mathcal{V}=\{\mathbf{v}_{1},...,\mathbf{v}_{N}\}$ is a set of elements called vertices and $\mathcal{E}=\{\mathbf{e}_{1},...,\mathbf{e}_{K}\}$ is a set of non-empty multi-element subsets of $\mathcal{V}$ called hyperedges. Let $M=\max\{|\mathbf{e}_{i}|:\mathbf{e}_{i}\in\mathcal{E}\}$ be the maximum cardinality of hyperedges, shorted as $m.c.e(\mathcal{H})$ of $\mathcal{H}$ .

In a general hypergraph $\mathcal{H}$ , different hyperedges may contain different numbers of nodes. The $m.c.e(\mathcal{H})$ denotes the number of vertices in the largest hyperedge. An example of a hypergraph with $7$ nodes, $3$ hyperedges and $m.c.e=3$ is shown in Fig. 4.

From the definition, we see that a normal graph is a special case of a hypergraph if $M=2$ . The hypergraph is a natural extension of the normal graph to represent high-dimensional interactions. To represent a hypergraph mathematically, there are two major methods based on matrix and tensor respectively. In the matrix-based method, a hypergraph is represented by a matrix $\mathbf{G}\in\mathbb{R}^{N\times E}$ where $E$ equals the number of hyperedges. The rows of the matrix represent the nodes, and the columns represent the hyperedges [19]. Thus, each element in the matrix indicates whether the corresponding node is involved in the particular hyperedge. Although such a matrix-based representation is simple in formation, it is hard to define and implement signal processing directly as in GSP by using the matrix $\mathbf{G}$ . Unlike the matrix-based method, tensor has better flexibility in describing the structures of the high-dimensional graphs [42]. More specifically, tensor can be viewed as an extension of matrix into high-dimensional domains. The adjacency tensor, which indicates whether nodes are connected, is a natural hypergraph counterpart to the adjacency matrix in the normal graph theory [51]. Thus, we prefer to represent the hypergraphs using tensors. In Section III-A, we will provide more details on how to represent the hypergraphs and signals in tensor forms.

II-C Tensor Basics

Before we introduce our tensor-based HGSP framework, let us introduce some tensor basics to be used later. Tensors can effectively represent high-dimensional graphs [14]. Generally speaking, tensors can be interpreted as multi-dimensional arrays. The order of a tensor is the number of indices needed to label a component of that array[24]. For example, a third-order tensor has three indices. In fact, scalars, vectors and matrices are all special cases of tensors: a scalar is a zeroth-order tensor; a vector is a first-order tensor; a matrix is a second-order tensor; and an $M$ -dimensional array is an $M$ th-order tensor [10]. Generalizing a 2-D matrix, we represent the entry at the position $(i_{1},i_{2},\cdots,i_{M})$ of an $M$ th-order tensor $\mathbf{T}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{M}}$ by $t_{i_{1}i_{2}\cdots i_{M}}$ in the rest of the paper.

Below are some useful definitions and operations of tensor related to the proposed HGSP framework.

II-C1 Symmetric and Diagonal Tensors

•

A tensor is super-symmetric if its entries are invariant under any permutation of their indices [44]. For example, a third-order $\mathbf{T}\in\mathbb{R}^{I\times I\times I}$ is super-symmetric if its entries $t_{ijk}$ ’s satisfy

[TABLE]

Analysis of super-symmetric tensors, which is shown to be bijectively related to homogeneous polynomials, could be found in [45, 46].

•

A tensor $\mathbf{T}\in\mathbb{R}^{I_{1}\times I_{2}\cdots\times I_{N}}$ is super-diagonal if its entries $t_{i_{1}i_{2}\cdots i_{N}}\neq 0$ only if $i_{1}=i_{2}=\cdots=i_{N}$ . For example, a third-order $\mathbf{T}\in\mathbb{R}^{I\times I\times I}$ is super-diagonal if its entries $t_{iii}\neq 0$ for $i=1,2,\cdots,I$ , while all other entries are zero.

II-C2 Tensor Operations

Tensor analysis is developed based on tensor operations. Some tensor operations are commonly used in our HGSP framework [50, 48, 49].

•

The tensor outer product between an $P$ th-order tensor $\mathbf{U}\in\mathbb{R}^{I_{1}\times I_{2}\times...\times I_{P}}$ with entries $u_{i_{1}...i_{P}}$ and an $Q$ th-order tensor $\mathbf{V}\in\mathbb{R}^{J_{1}\times J_{2}\times...\times J_{Q}}$ with entries $v_{j_{1}...j_{Q}}$ is denoted by $\mathbf{W}=\mathbf{U}\circ\mathbf{V}$ . The result $\mathbf{W}\in\mathbb{R}^{I_{1}\times I_{2}\times...\times I_{P}\times J_{1}\times J_{2}\times...\times J_{Q}}$ is an $(P+Q)$ -th order tensor, whose entries are calculated by

[TABLE]

The major use of the tensor outer product is to construct a higher order tensor with several lower order tensors. For example, the tensor outer product between vectors $\mathbf{a}\in\mathbb{R}^{M}$ and $\mathbf{b}\in\mathbb{R}^{N}$ is denoted by

[TABLE]

where the result $\mathbf{T}$ is a matrix in $\mathbb{R}^{M\times N}$ with entries $t_{ij}=a_{i}\cdot b_{j}$ for $i=1,2,\cdots,M$ and $j=1,2,\cdots,N$ . Now, we introduce one more vector $\mathbf{c}\in\mathbb{R}^{Q}$ , where

[TABLE]

Here, the result $\mathbf{S}$ is a third-order tensor with entries $s_{ijk}=a_{i}\cdot b_{j}\cdot c_{k}=t_{ij}\cdot c_{k}$ for $i=1,2,\cdots,M$ , $j=1,2,\cdots,N$ and $k=1,2,\cdots,Q$ .

•

The n-mode product between a tensor $\mathbf{U}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{P}}$ and a matrix $\mathbf{V\in\mathbb{R}^{J\times I_{n}}}$ is denoted by $\mathbf{W}=\mathbf{U\times_{n}V}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{n-1}\times J\times I_{n+1}\times\cdots\times I_{P}}$ . Each element in $\mathbf{W}$ is defined as

[TABLE]

where the main function is to adjust the dimension of a specific order. For example, in Eq. (9), the dimension of the $n$ th order of $\mathbf{U}$ is changed from $I_{n}$ to $J$ .

•

The Kronecker product of matrices $\mathbf{U}\in\mathbb{R}^{I\times J}$ and $\mathbf{V}\in\mathbb{R}^{P\times Q}$ is defined as

[TABLE]

to generate an $IP\times JQ$ matrix.

•

The Khatri-Rao product between $\mathbf{U}\in\mathbb{R}^{I\times K}$ and $\mathbf{V}\in\mathbb{R}^{J\times K}$ is defined as

[TABLE]

•

The Hadamard product between $\mathbf{U}\in\mathbb{R}^{P\times Q}$ and $\mathbf{V}\in\mathbb{R}^{P\times Q}$ is defined as

[TABLE]

II-C3 Tensor Decomposition

Similar to the eigen-decomposition for matrix, tensor decomposition analyzes tensors via factorization. The CANDECOMP/PARAFAC (CP) decomposition is a widely used method, which factorizes a tensor into a sum of component rank-one tensors [24, 47]. For example, a third order tensor $\mathbf{T}\in\mathbb{R}^{I\times J\times K}$ is decomposed into

[TABLE]

where $\mathbf{a}_{r}\in\mathbb{R}^{I}$ , $\mathbf{b}_{r}\in\mathbb{R}^{J}$ , $\mathbf{c}_{r}\in\mathbb{R}^{K}$ and $R$ is a positive integer known as rank, which leads to the smallest number of rank-one tensors in the decomposition. The process of CP decomposition for a third-order tensor is illustrated in Fig. 5.

There are several extensions and alternatives of the CP decomposition. For example, the orthogonal-CP decomposition [26] decomposes the tensor using an orthogonal basis. For an $M$ -th order $N$ -dimension tensor $\mathbf{T}\in\mathbb{R}^{\underbrace{\scriptstyle{N\times N\times...\times N}}_{\text{M times}}}$ , it can be decomposed by the orthogonal-CP decomposition as

[TABLE]

where $\lambda_{r}\geq 0$ and the orthogonal basis is $\mathbf{a}_{r}^{(i)}\in\mathbb{R}^{N}$ for $1\leq i\leq M$ . More specifically, the orthogonal-CP decomposition has a similar form to the eigen-decomposition when $M=2$ and $\mathbf{T}$ is super-symmetric.

II-C4 Tensor Spectrum

The eigenvalues and spectral space of tensors are significant topics in tensor algebra. The research of tensor spectrum has achieved great progress in recent years. It will take a large volume to cover all the properties of the tensor spectrum. Here, we just list some helpful and relevant literatures. In particular, Lim and the others developed theories of eigenvalues, eigenvectors, singular values, and singular vectors for tensors based on a constrained variational approach such as the Rayleigh quotient [86]. Qi and the others in [25, 85] presented a more complete discussion of tensor eigenvalues by defining two forms of tensor eigenvalues, i.e., the E-eigenvalue and the H-eigenvalue. Chang and the others [44] further extended the work of [25, 85]. Other works including [87, 88] further developed the theory of tensor spectrum.

III Definitions for Hypergraph Signal Processing

In this section, we introduce the core definitions used in our HGSP framework.

III-A Algebraic Representation of Hypergraphs

The traditional GSP mainly relies on the representing matrix of a graph. Thus, an effective algebraic representation is also helpful in developing a novel HGSP framework. As we mentioned in Section II-C, tensor is an intuitive representation for high-dimensional graphs. In this section, we introduce the algebraic representation of hypergraphs based on tensors.

Similar to the adjacency matrix whose 2-D entries indicate whether and how two nodes are pairwise connected by a simple edge, we adopt an adjacency tensor whose entries indicate whether and how corresponding subsets of $M$ nodes are connected by hyperedges to describe hypergraphs[20].

Definition 2 (Adjacency tensor).

A hypergraph $\mathcal{H}=(\mathcal{V,E})$ with $N$ nodes and $m.c.e(\mathcal{H})=M$ can be represented by an $M$ th-order $N$ -dimension adjacency tensor $\mathbf{A}\in\mathbb{R}^{\underbrace{\scriptstyle{N\times N\times\cdots\times N}}_{\text{M times}}}$ defined as

[TABLE]

Suppose that $\mathbf{e}_{l}=\{\mathbf{v}_{l1},\mathbf{v}_{l2},\cdots,\mathbf{v}_{lc}\}\in\mathcal{E}$ is a hyperedge in $\mathcal{H}$ with the number of vertices $c\leq M$ . Then, $\mathbf{e}_{l}$ is represented by all the elements $a_{p_{1}\cdots p_{M}}$ ’s in $\mathbf{A}$ , where a subset of $c$ indices from $\{p_{1},p_{2},\cdots,p_{M}\}$ are exactly the same as $\{l_{1},l_{2},\cdots,l_{c}\}$ and the other $M-c$ indices are picked from $\{l_{1},l_{2},\cdots,l_{c}\}$ randomly. More specifically, these elements $a_{p_{1}\cdots p_{M}}$ ’s describing $\mathbf{e}_{l}$ are calculated as

[TABLE]

Meanwhile, the entries, which do not correspond to any hyperedge $\mathbf{e}\in\mathcal{E}$ , are zeros.

Note that Eq. (16) enumerates all the possible combinations of $c$ positive integers $\{k_{1},\cdots,k_{c}\}$ , whose summation satisfies $\sum_{i=1}^{c}k_{i}=M$ . Obviously, when the hypergraph degrades to the normal graph with $c=M=2$ , the weights of edges are calculated as one, i.e., $a_{ij}=a_{ji}=1$ for an edge $\mathbf{e}=(i,j)\in\mathcal{E}$ . Then, the adjacency tensor is the same as the adjacency matrix. To understand the physical meaning of the adjacency tensor and its weight, we start with the $M$ -uniform hypergraph with $N$ nodes, where each hyperedge has exactly $M$ nodes[53]. Since each hyperedge has an equal number of nodes, all hyperedges follow a consistent form to describe an $M$ -lateral relationship with $m.c.e=M$ . Obviously, such $M$ -lateral relationships can be represented by an $M$ th-order tensor $\mathbf{A}$ , where the entry $a_{i_{1}i_{2}\cdots i_{M}}$ indicates whether the nodes $\mathbf{v}_{i_{1}},\mathbf{v}_{i_{2}},\cdots,\mathbf{v}_{i_{M}}$ are in the same hyperedge, i.e., whether a hyperedge $\mathbf{e}=\{\mathbf{v}_{i_{1}},\mathbf{v}_{i_{2}},\cdots,\mathbf{v}_{i_{M}}\}$ exists. If the weight is nonzero, the hyperedge exists; otherwise, the hyperedge does not exist. Taking the $3$ -uniform hypergraph in Fig. 6(a) as an example, the hyperedge $\mathbf{e}_{1}$ is characterized by $a_{146}=a_{164}=a_{461}=a_{416}=a_{614}=a_{641}\neq 0$ , the hyperedge $\mathbf{e}_{2}$ is characterized by $a_{237}=a_{327}=a_{732}=a_{723}=a_{273}=a_{372}\neq 0$ , and $\mathbf{e}_{3}$ is represented by $a_{567}=a_{576}=a_{657}=a_{675}=a_{756}=a_{765}\neq 0$ . All other entries in $\mathbf{A}$ are zero. Note that, all the hyperedges in an $M$ -uniform hypergraph has the same weight. Different hyperedges are distinguished by the indices of the entries. More specifically, similarly as $a_{ij}$ in the adjancency matrix implies the connection direction from node $\mathbf{v}_{j}$ to node $\mathbf{v}_{i}$ in GSP, an entry $a_{i_{1}i_{2}\cdots i_{M}}$ characterizes one direction of the hyperedge $\mathbf{e}=\{\mathbf{v}_{i_{1}},\mathbf{v}_{i_{2}},\cdots,\mathbf{v}_{i_{M}}\}$ with node $\mathbf{v}_{i_{M}}$ as the source and node $\mathbf{v}_{i_{1}}$ as the destination.

However, for a general hypergraph, different hyperedges may contain different numbers of nodes. For example, in the hypergraph of Fig. 6(b), the hyperedge $\mathbf{e}_{2}$ only contains two nodes. How to represent the hyperedges with the number of nodes below $m.c.e=M$ may become an issue. To represent such a hyperedge $\mathbf{e}_{l}=\{\mathbf{v}_{l_{1}},\mathbf{v}_{l_{2}},...,\mathbf{v}_{l_{c}}\}\in\mathcal{E}$ with the number of vertices $c<M$ in an $M$ th-order tensor, we can use entries $a_{i_{1}i_{2}\cdots i_{M}}$ , where a subset of $c$ indices are the same as $\{l_{1},\cdots,l_{c}\}$ (possibly a different order) and the other $M-c$ indices are picked from $\{l_{1},\cdots,l_{c}\}$ randomly. This process can be interpreted as generlaizing the hyperedge with $c$ nodes to a hyperedge with $M$ nodes by duplicating $M-c$ nodes from the set $\{\mathbf{v}_{l_{1}},\cdots,\mathbf{v}_{l_{c}}\}$ randomly with possible repetitions. For example, the hyperedge $\mathbf{e}_{2}=\{\mathbf{v}_{2},\mathbf{v}_{3}\}$ in Fig. 6(b) can be represented by the entries $a_{233}=a_{323}=a_{332}=a_{322}=a_{223}=a_{232}$ in the third-order tensor $\mathbf{A}$ , which could be interpreted as generalizing the original hyperedge with $c=2$ to hyperedges with $M=3$ nodes as Fig. 7. We can use Eq. (16) as a generalization coefficient of each hyperedge with respect to permutation and combination [20]. More specifically, for the adjacency tensor of the hypergraph in Fig. 6(b), the entries are calculated as $a_{146}=a_{164}=a_{461}=a_{416}=a_{614}=a_{641}=a_{567}=a_{576}=a_{657}=a_{675}=a_{756}=a_{765}=\frac{1}{2}$ , $a_{233}=a_{323}=a_{332}=a_{322}=a_{223}=a_{232}=\frac{1}{3}$ , where the remaining entries are set to zeros. Note that, the weight is smaller if the original hyperedge has fewer nodes in Fig. 6(b). More generally, based on the definition of adjacency tensor and Eq. (16), we can easily obtain the following property regarding the hyperedge weight.

Property 1.

Given two hyperedges $\mathbf{e}_{i}=\{\mathbf{v}_{1},\cdots,\mathbf{v}_{I}\}$ and $\mathbf{e}_{j}=\{\mathbf{v}_{1},\cdots,\mathbf{v}_{J}\}$ , the edgeweight $w(\mathbf{e}_{i})$ of $\mathbf{e}_{i}$ is different from the edgeweight $w(\mathbf{e}_{j})$ of $\mathbf{e}_{j}$ in the adjacency tensor $\mathbf{A}$ , i.e., $w(\mathbf{e}_{i})\neq w(\mathbf{e}_{j})$ , if $I\neq J$ . Moreover, $w(\mathbf{e}_{i})=w(\mathbf{e}_{j})$ iff $I=J$ .

This property can help identify the length of each hyperedge based on the weights in the adjacency tensor. Moreover, the edgeweights of two hyperedges with the same number of nodes are the same. Different hyperedges with the same number of nodes are distinguished by their indices of entries in an adjacency tensor.

The degree $d(\mathbf{v}_{i})$ , of a vertex $\mathbf{v}_{i}\in\mathcal{V}$ , is the number of hyperedges containing $\mathbf{v}_{i}$ , i.e.,

[TABLE]

Then, the Laplacian tensor of the hypergraph $\mathcal{H}$ is defined as follows [20].

Definition 3 (Laplacian tensor).

Given a hypergraph $\mathcal{H=(V,E)}$ with $N$ nodes and $m.c.e(\mathcal{H})=M$ , the Laplacian tensor is defined as

[TABLE]

which is an $M$ th-order $N$ -dimension tensor. Here, $\mathbf{D}=(d_{i_{1}i_{2}\cdots i_{M}})$ is also an $M$ th-order $N$ -dimension super-diagonal tensor with nonzero elements of $d_{\underbrace{\scriptstyle{ii\cdots i}}_{\text{M times}}}=d(\mathbf{v}_{i})$ .

We see that both the adjacency and Laplacian tensors of a hypergraph $\mathcal{H}$ are super-symmetric. Moreover, when $m.c.e(\mathcal{H})=2$ , they have similar forms to the adjacency and Laplacian matrices of undirected graphs respectively. Similar to GSP, we use an $M$ th-order $N$ -dimension tensor $\mathbf{F}$ as a general representation of a given hypergraph $\mathcal{H}$ for convenience. As the adjacency tensor is more general, the representing tensor $\mathbf{F}$ refers to the adjacency tensor in this paper unless specified otherwise.

III-B Hypergraph Signal and Signal Shifting

Based on the tensor representation of hypergraphs, we now provide definitions for the hypergraph signal. In the traditional GSP, each signal element is related to one node in the graph. Thus, the graph signal in GSP is defined as an $N$ -length vector if there are $N$ nodes in the graph. Recall that the representing matrix of a normal graph can be treated as a graph filter, for which the basic form of the filtered signal is defined in Eq. (1). Thus, we could extend the definitions of the graph signal and signal shifting from the traditional GSP to HGSP based on the tensor-based filter implementation.

In HGSP, we also relate signal element to one node in the hypergraph. Naturally, we can define the original signal as an $N$ -length vector if there are $N$ nodes. Similarly as in GSP, we define the hypergraph shifting based on the representing tensor $\mathbf{F}$ . However, since tensor $\mathbf{F}$ is of $M$ -th order, we need an $(M-1)$ -th order signal tensor to work with the hypergraph filter $\mathbf{F}$ , such that the filtered signal is also an $N$ -length vector as the original signal. For example, for a two-step polynomial filter shown as Fig. 8, the signals $\mathbf{s,s^{\prime},s^{\prime\prime}}$ should all be in the same dimension and order. For the input and output signals in a HGSP system to have a consistent form, we define an alternative form of the hypergraph signal as below.

Definition 4 (Hypergraph signal).

For a hypergraph $\mathcal{H}$ with $N$ nodes and $m.c.e(\mathcal{H})=M$ , an alternative form of hypergraph signal is an $(M-1)$ -th order $N$ -dimension tensor $\mathbf{s}^{[M-1]}$ obtained from $(M-1)$ times outer product of the original signal $\mathbf{s}=[s_{1}\quad s_{2}\quad...\quad s_{N}]^{\mathrm{T}}$ , i.e.,

[TABLE]

where each entry in position $(i_{1},i_{2},\cdots,i_{M-1})$ equals the product $s_{i_{1}}s_{i_{2}}\cdots s_{i_{M-1}}$ .

Note that the above hypergraph signal comes from the original signal. They are different forms of the same signal, which reflect the signal properties in different dimensions. For example, a second-order hypergraph signal highlights the properties of the two-dimensional signal components $s_{i}s_{j}$ while the original signal directly emphasizes more about the one-dimension properties. We will discuss in greater details on the relationship between the hypergraph signal and the original signal in Section III-D.

With the definition of hypergraph signals, let us define the original domain of signals for convenience before we step into the signal shifting. Similarly as that the signals lie in the time domain for DSP, we have the following definition of hypergraph vertex domain.

Definition 5 (Hypergraph vertex domain).

A signal lies in the hypergraph vertex domain if it resides on the structure of a hypergraph in the HGSP framework.

The hypergraph vertex domain is a counterpart of time domain in HGSP. The signals are analyzed based on the structure among vertices in a hypergraph.

Next, we discuss how the signals shift on the given hypergraph. Recall that, in GSP, the signal shifting is defined by the product of the representing matrix $\mathbf{F_{M}}\in\mathbb{R}^{N\times N}$ and the signal vector $\mathbf{s}\in\mathbb{R}^{N}$ , i.e., $\mathbf{s^{\prime}=F_{M}s}$ . Similarly, we define the hypergraph signal shifting based on its tensor $\mathbf{F}$ and the hypergraph signal $\mathbf{s}^{[M-1]}$ .

Definition 6 (Hypergraph shifting).

The basic shifting filter of hypergraph signals is defined as the direct contraction between the representing tensor $\mathbf{F}$ and the hypergraph signals $\mathbf{s}^{[M-1]}$ , i.e.,

[TABLE]

where each element of the filter output is given by

[TABLE]

Since the hypergraph signal contracts with the representing tensor in $M-1$ order, the one-time filtered signal $\mathbf{s}_{(1)}$ is an $N$ -length vector, which has the same dimension as the original signal. Thus, the block diagram of a hypergraph filter with $\mathbf{F}$ can be shown as Fig. 9.

Let us now consider the functionality of the hypergraph filter, as well as the physical insight of the hypergraph shifting. In GSP, the functionality of the filter $\mathbf{F_{M}}$ is simply to shift the signals along the link directions. However, interactions inside the hyperedge are more complex as it involves more than two nodes. In Eq. (21), we see that the filtered signal in $\mathbf{v}_{i}$ equals the summation of the shifted signal components in all hyperedges containing node $\mathbf{v}_{i}$ , where $f_{ij_{1}\cdots j_{M-1}}$ is the weight for each involved hyperedge and $\{s_{j_{1}},\cdots,s_{j_{M-1}}\}$ are the signals in the generalized hyperedges excluding $s_{i}$ . Clearly, the hypergraph shifting multiplies signals in the same hyperedge of node $\mathbf{v}_{i}$ together before delivering the shift to a certain node $\mathbf{v}_{i}$ . Taking the hypergraph in Fig. 6(a) as an example, node $\mathbf{v}_{7}$ is included in two hyperedges, $\mathbf{e}_{2}=\{\mathbf{v}_{2},\mathbf{v}_{3},\mathbf{v}_{7}\}$ and $\mathbf{e}_{3}=\{\mathbf{v}_{5},\mathbf{v}_{6},\mathbf{v}_{7}\}$ . According to Eq. (21), the shifted signal in node $\mathbf{v}_{7}$ is calculated as

[TABLE]

where $f_{732}=f_{723}$ is the weight of the hyperedge $\mathbf{e}_{2}$ and $f_{756}=f_{765}$ is the weight for the hyperedge $\mathbf{e}_{3}$ in the adjacency tensor $\mathbf{F}$ .

As the entry $a_{ji}$ in the adjacency matrix of a normal graph indicates the link direction from the node $\mathbf{v}_{i}$ to the node $\mathbf{v}_{j}$ , the entry $f_{i_{1}\cdots i_{M}}$ in the adjacency tensor similarly indicates the order of nodes in a hyperedge as $\{\mathbf{v}_{i_{M}},\mathbf{v}_{i_{M-1}},\cdots\mathbf{v}_{i_{1}}\}$ , where $\mathbf{v}_{i_{1}}$ is the destination and $\mathbf{v}_{i_{M}}$ is the source. Thus, the shifting by Eq. (22) could be interpreted as shown in Fig. 10(a). Since there are two possible directions from nodes $\{\mathbf{v}_{2},\mathbf{v}_{3}\}$ to node $\mathbf{v}_{7}$ in $\mathbf{e}_{2}$ , there are two components shifted to $\mathbf{v}_{7}$ , i.e., the first two terms in Eq. (22). Similarly, there are also two components shifted by the hyperedge $\mathbf{e}_{3}$ , i.e., the last two terms in Eq. (22). To illustrate the hypergraph shifting more explicitly, Fig. 10(b) shows a diagram of signal shifting to a certain node in an $M$ -way hyperedge. From Fig. 10(b), we see that the graph shifting in GSP is a special case of the hypergraph shifting, where $M=2$ . Moreover, there are $K=(M-1)!$ possible directions for the shifting to one specific node in an $M$ -way hyperedge.

III-C Hypergraph Spectrum Space

We now provide the definitions of the hypergraph Fourier space, i.e., the hypergraph spectrum space. In GSP, the graph Fourier space is defined as the eigenspace of its representing matrix [5]. Similarly, we define the Fourier space of HGSP based on the representing tensor $\mathbf{F}$ of a hypergraph, which characterizes the hypergraph structure and signal shifting. For an $M$ -th order $N$ -dimension tensor $\mathbf{F}$ , we can apply the orthogonal-CP decomposition [26] to write

[TABLE]

with basis $\mathbf{f}_{r}^{(i)}\in\mathbb{R}^{N}$ for $1\leq i\leq M$ and $\lambda_{r}\geq 0$ . Since $\mathbf{F}$ is super-symmetric [25], i.e., $\mathbf{f}_{r}=\mathbf{f}_{r}^{(1)}=\mathbf{f}_{r}^{(2)}=\cdots=\mathbf{f}_{r}^{(M)}$ , we have

[TABLE]

Generally, we have the rank $R\leq N$ in a hypergraph. We will discuss how to construct the remaining $\mathbf{f}_{i}$ , $R<i\leq N$ , for the case of $R<N$ later in Section III-F.

Now, by plugging Eq. (24) into Eq. (20), the hypergraph shifting can be written with the $N$ basis $\mathbf{f}_{i}$ ’s as

[TABLE]

where $<\mathbf{f}_{r},\mathbf{s}>=(\mathbf{f}_{i}^{\mathrm{T}}\mathbf{s})$ is the inner product between $\mathbf{f}_{r}$ and $\mathbf{s}$ , and $(\cdot)^{M-1}$ is $(M-1)$ th power.

From Eq. (25d), we see that the shifted signal in HGSP is in a similar decomposed to Eqs. (3) and (4) for GSP. The first two parts in Eq. (25d) work like $\mathbf{V_{M}^{-1}\Lambda}$ of the GSP eignen-decomposition, which could be interpreted as inverse Fourier transform and filter in the Fourier space. The third part can be understood as the hypergraph Fourier transform of the original signal. Hence, similarly as in GSP, we can define the hypergraph Fourier space and Fourier transform based on the orthogonal-CP decomposition of $\mathbf{F}$ .

Definition 7 (Hypergraph Fourier space and Fourier transform).

The hypergraph Fourier space of a given hypergraph $\mathcal{H}$ is defined as the space consisting of all orthogonal-CP decomposition basis $\{\mathbf{f}_{1},\mathbf{f}_{2},...,\mathbf{f}_{N}\}$ . The frequencies are defined with respect to the eigenvalue coefficients $\lambda_{i}$ , $1\leq i\leq N$ . The hypergraph Fourier transform (HGFT) of hypergraph signals is defined as

[TABLE]

Compared to GSP, if $M=2$ , the HGFT has the same form as the traditional GFT. In addition, since $\mathbf{f}_{r}$ is the orthogonal basis, we have

[TABLE]

According to [25], a vector $\mathbf{x}$ is an E-eigenvector of an $M$ th-order tensor $\mathbf{A}$ if $\mathbf{Ax}^{[M-1]}=\lambda\mathbf{x}$ exists for a constant $\lambda$ . Then, we obtain the following property of the hypergraph spectrum.

Property 2.

The hypergraph spectrum pair $(\lambda_{r},\mathbf{f}_{r})$ is an E-eigenpair of the representing tensor $\mathbf{F}$ .

Recall that the spectrum space of GSP is the eigenspace of the representing matrix $\mathbf{F_{M}}$ . Property 2 shows that HGSP has a consistent definition in the spectrum space as that for GSP.

III-D Relationship between Hypergraph Signal and Original Signal

With HGFT defined, let us discuss more about the relationship between the hypergraph signal and the original signal in the Fourier space to understand the HGFT better. From Eq. (26b), the hypergraph signal in the Fourier space is written as

[TABLE]

which can be further decomposed as

[TABLE]

where $*$ denotes Hadamard product.

From Eq. (29), we see that the hypergraph signal in the hypergraph Fourier space is $M-1$ times Hadamard product of a component consisting of the hypergraph Fourier basis and the original signal. More specifically, this component works as the original signal in the hypergraph Fourier space, which is defined as

[TABLE]

where $\mathbf{V}=[\mathbf{f}_{1}\quad\mathbf{f}_{2}\quad\cdots\quad\mathbf{f}_{N}]^{\mathrm{T}}$ and $\mathbf{V}^{\mathrm{T}}\mathbf{V=I}$ .

Recall the definitions of the hypergraph signal and vertex domain in Section III-B, we have the following property.

Property 3.

The hypergraph signal is the $M-1$ times tensor outer product of the original signal in the hypergraph vertex domain, and the $M-1$ times Hadamard product of the original signal in the hypergraph frequency domain.

Then, we could establish a connection between the original signal and the hypergraph signal in the hypergraph Fourier domain by the HGFT and inverse HGFT (iHGFT) as shown in Fig. 11. Such a relationship leads to some interesting properties and makes the HGFT implementation more straightforward, which will be further discussed in Section III-F and Section III-G, respectively.

III-E Hypergraph Frequency

As we now have a better understanding of the hypergraph Fourier space and Fourier transform, we can discuss more about the hypergraph frequency and its order. In GSP, the graph frequency is defined with respect to the eigenvalues of the representing matrix $\mathbf{F_{M}}$ and ordered by the total variation [5]. Similarly, in HGSP, we define the frequency relative to the coefficients $\lambda_{i}$ from the orthogonal-CP decomposition. We order them by the total variation of frequency components $\mathbf{f}_{i}$ over the hypergraph. The total variation of a general signal component over a hypergraph is defined as follows.

Definition 8 (Total variation over hypergraph).

Given a hypergraph $\mathcal{H}$ with $N$ nodes and the normalized representing tensor $\mathbf{F}^{norm}=\frac{1}{\lambda_{max}}\mathbf{F}$ , together with the original signal $\mathbf{s}$ , the total variation over the hypergraph is defined as the total differences between the nodes and their corresponding neighbors in the perspective of shifting, i.e.,

[TABLE]

We adopt the $l_{1}$ -norm here only as an example of defining the total variation. Other norms may be more suitable depending on specific applications. Now, with the definition of total variation over hypergraphs, the frequency in HGSP is ordered by the total variation of the corresponding frequency component $\mathbf{f}_{r}$ , i.e.,

[TABLE]

where $\mathbf{f}^{norm}_{r(1)}$ is the output of one-time shifting for $\mathbf{f}_{r}$ over the normalized representing tensor.

From Eq. (31a), we see that the total variation describes how much the signal component changes from a node to its neighbors over the hypergraph shifting. Thus, we have the following definition of hypergraph frequency.

Definition 9 (Hypergraph frequency).

Hypergraph frequency describes how oscillatory the signal component is with respect to the given hypergraph. A frequency component $\mathbf{f}_{r}$ is associated with a higher frequency if the total variation of this frequency component is larger.

Note that, the physical meaning of graph frequency was stated in GSP [2]. Generally, the graph frequency is highly related to the total variation of the corresponding frequency component. Similarly, the hypergraph frequency also relates to the corresponding total variation. We will discuss more about the interpretation of the hypergraph frequency and its relationships with DSP and GSP later in Section IV-A, to further solidate our hypergraph frequency definition.

Based on the definition of total variation, we describe one important property of $\mathbf{TV(f}_{r})$ in the following theorem.

Theorem 1.

Define a supporting matrix

[TABLE]

With the normalized representing tensor $\mathbf{F}^{norm}=\frac{1}{\lambda_{\max}}\mathbf{F}$ , the total variation of hypergraph spectrum $\mathbf{f}_{r}$ is calculated as

[TABLE]

Moreover, $\mathbf{TV}(\mathbf{f}_{i})>\mathbf{TV}(\mathbf{f}_{j})$ iff $\lambda_{i}<\lambda_{j}$ .

Proof:

For hypergraph signals, the output of one-time shifting of $\mathbf{f}_{r}$ is calculated as

[TABLE]

Based on the normalized $\mathbf{F}^{norm}$ , we have $\mathbf{f}_{r(1)}^{norm}=\frac{\lambda_{r}}{\lambda_{max}}\mathbf{f}_{r}$ . It is therefore easy to obtain Eq. (34c) from Eq. (34a). To obtain Eq. (34b), we have

[TABLE]

It is clear that Eq. (34b) is the same as Eq. (34c).

Since $\lambda$ is real and nonnegative, we have

[TABLE]

Obviously, $\mathbf{TV}(\mathbf{f}_{i})>\mathbf{TV}(\mathbf{f}_{j})$ iff $\lambda_{i}<\lambda_{j}$ . ∎

Theorem 1 shows that the supporting matrix $\mathbf{P_{s}}$ can help us apply the total variation more efficiently in some real applications. Moreover, it provides the order of frequency according to the coefficients $\lambda_{i}$ ’s with the following property.

Property 4.

A smaller $\lambda$ is related to a higher frequency in the hypergraph Fourier space, where its corresponding spectrum basis is called a high frequency component.

III-F Signals with Limited Spectrum Support

With the order of frequency, we define the bandlimited signals as follows.

Definition 10 (Bandlimited signal).

Order the coefficients as $\lambda=[\lambda_{1}\quad\cdots\quad\lambda_{N}]$ where $\lambda_{1}\geq\cdots\geq\lambda_{N}\geq 0$ , together with their corresponding $\mathbf{f}_{r}$ ’s. A hypergraph signal $\mathbf{s}^{[M-1]}$ is defined as $K$ -bandlimited if the HGFT transformed signal $\mathbf{\hat{s}}=[\hat{s}_{1},\cdots,\hat{s}_{N}]^{\mathrm{T}}$ has $\hat{s}_{i}=0$ for all $i\geq K$ where $K\in\{1,2,\cdots,N\}$ . The smallest $K$ is defined as the bandwidth and the corresponding boundary is defined as $W=\lambda_{K}$ .

Note that, a larger $\lambda_{i}$ corresponds to a lower frequency as we mentioned in Property 4. Then, the frequency are ordered from low to high in the definition above. Moreover, we use the index $K$ instead of the coefficient value $\lambda$ to define the bandwidth for the following reasons:

•

Identical $\lambda$ ’s in two diferent hypergraphs do not refer to the same frequency. Since each hypergraph has its own adjacency tensor and spectrum space, the comparison of multiple spectrum pairs $(\lambda_{i},\mathbf{f}_{i})$ ’s is only meaningful within the same hypergraph. Moreover, there exists a normalization issue in the decomposition of different adjacency tensors. Thus, it is not meaningful to compare $\lambda_{k}$ ’s across two different hypergraphs.

•

Since $\lambda_{k}$ values are not continuous over $k$ , different frequency cutoffs of $\lambda$ may lead to the same bandlimited space. For example, suppose that $\lambda_{k}=0.5$ and $\lambda_{k+1}=0.8$ . Then, $\lambda=0.6$ and $\lambda^{\prime}=0.7$ would lead to the same cutoff in the frequency space, which makes bandwidth definition non-unique.

As we discussed in Section III-D, the hypergraph signal is the Hadamard product of the original signal in the frequency domain. Then, we have the following property of bandwidth.

Property 5.

The bandwidth $K$ is the same based on the HGFT of the hypergraph signals $\mathbf{\hat{s}}$ and that of the original signals $\mathbf{\tilde{s}}$ .

This property allows us to analyze the spectrum support of the hypergraph signal by looking into the original signal with lower complexity. Recall that we can add $\mathbf{f}_{i}$ by using zero coefficients $\lambda_{i}$ when $R<N$ as mentioned in Section III-C. The added basis should not affect the HGFT signals in Fourier space. According to the structure of bandlimited signal, we need the added $\mathbf{f}_{i}$ could meet the following conditions: (1) $\mathbf{f}_{i}\perp\mathbf{f}_{p}$ for $p\neq i$ ; (2) $\mathbf{f}_{i}^{\mathrm{T}}\cdot\mathbf{s}\to 0$ ; and (3) $|\mathbf{f}_{i}|=1$ .

III-G Implementation and Complexity

We now consider the implementation and complexity issues of HGFT. Similar to GFT, the process of HGFT consists of two steps: decomposition and execution. The decomposition is to calculate the hypergraph spectrum basis, and the execution transforms signals from the hypergraph vertex domain into the spectrum domain.

•

The calculation of spectrum basis by the orthogonal-CP decomposition is an important preparation step for HGFT. A straightforward algorithm would decompose the representing tensor $\mathbf{F}$ with the spectrum basis $\mathbf{f}_{i}$ ’s and coefficients $\lambda_{i}$ ’s as in Eq. (24). Efficient tensor decomposition is an active topic in both fields of mathematics and engineering. There are a number of methods for CP decomposition in the literature. In [54, 58], motivated by the spectral theorem for real symmetric matrices, orthogonal-CP decomposition algorithms for symmetric tensors are developed based polynomial equations. In [26], Afshar et al. proposed a more general decomposition algorithm for spatio-temporal data. Other works, including [55, 56, 57], tried to develop faster decomposition methods for signal processing and big data applications. The rapid development of tensor decomposition and the advancement of computation ability will benefit the efficient derivation of hypergraph spectrum.

•

The execution of HGFT with a known spectrum basis is defined in Eq. (26b). According to Eq. (29), the HGFT of hypergraph signal is an $M-1$ times Hadamard product of the original signal in the hypergraph spectrum space. This relationship can help execute HGFT and iHGFT of hypergraph signals more efficiently by applying matrix operations on the original signals. Clearly, the complexity of calculating the original signals in the frequency domain $\mathbf{\tilde{s}=Vs}$ is $O(N^{2})$ . In addition, since the computation complexity of the power function $x^{(M-1)}$ could be $O(\log(M-1))$ and each vector has $N$ entries, the complexity of calculating the $M-1$ times Hadamard product is $O(N\log(M-1))$ . Thus, the complexity of general HGFT implementation is $O(N^{2}+N\log(M-1))$ .

IV Discussions and Interpretations

In this section, we focus on the insights and physical meaning of frequency to help interpret the hypergraph spectrum space. We also consider the relationships between HGSP and other existing works to better understand the HGSP framework.

IV-A Interpretation of Hypergraph Spectrum Space

We are interested in an intuitive interpretation of the hypergraph frequency and its relations with the DSP and GSP frequencies. We start with the frequency and the total variation in DSP. In DSP, the discrete Fourier transform (DFT) of a sequence $s_{n}$ is given by $\hat{s}_{k}=\sum_{n=0}^{N-1}s_{n}e^{-j\frac{2\pi kn}{N}}$ and the frequency is defined as $\nu_{n}=\frac{n}{N}$ , $n=0,1,\cdots,N-1$ . From [38], we can easily summarize the following conclusions:

•

$\nu_{n}:\;1<n<\frac{N}{2}-1$ corresponds to a continuous time signal frequency ${n\over N}f_{s}$ ;

•

$\nu_{n}:\;\frac{N}{2}+1<n<N-1$ corresponds to a continuous time signal frequency $-(1-{n\over N})f_{s}$ ;

•

$\nu_{\frac{N}{2}}$ corresponds to $f_{s}/2$ ;

•

$n=0$ corresponds to frequency 0.

Here, $f_{s}$ is the critical sampling frequency. In traditional DFT, we generate the Fourier transform $\hat{f}(\omega)=\int_{-\infty}^{\infty}f(x)e^{-2\pi jx\omega}dx$ at each discrete frequency $\frac{n}{N}f_{s}$ , $n=-\frac{N}{2}+1,-\frac{N}{2}+2,\cdots,\frac{N}{2}-1,\frac{N}{2}$ . The highest and lowest frequencies correspond to $n=N/2$ and $n=0$ , respectively. Note that $n$ varies from $-\frac{N}{2}+1$ to $\frac{N}{2}$ here. Since $e^{-j2\pi k\frac{n}{N}}=e^{-j2\pi k\frac{n+N}{N}}$ , we can let $n$ vary from [math] to $N-1$ and cover the complete period. Now, $n$ varies in exact correspondence to $\nu_{n}$ , and the aforementioned conclusions are drawn. The highest frequency occurs at $n=\frac{N}{2}$ .

The total variation in DSP is defined as the differences among the signals over time [59], i.e.,

[TABLE]

where

[TABLE]

When we perform the eigen-decomposition of $\mathbf{C_{N}}$ , we see that the eigenvalues are $\lambda_{n}=e^{-j\frac{2\pi n}{N}}$ with eigenvector $\mathbf{f}_{n}$ , $0\leq n\leq N-1$ . More specifically, the total variation of the frequency component $\mathbf{f}_{n}$ is calculated as

[TABLE]

which increases with $n$ for $n\leq\frac{N}{2}$ before decreasing with $n$ for $\frac{N}{2}<n\leq N-1$ .

Obviously, the total variations of frequency components have a one-to-one correspondence to frequencies in the order of their values. If the total variation of a frequency component is larger, the corresponding frequency with the same index $n$ is higher. It also has clear physical meaning, i.e., a higher frequency component changes faster over time, which implies a larger total variation. Thus, we could also use the total variation of a frequency component to characterize its frequency in DSP.

Let us now consider the total variation and frequency in GSP, where the signals are analyzed in the graph vertex domain instead of the time domain. Similar to the fact that the frequency in DSP describes the rate of signal changes over time, the frequency in GSP illustrates the rate of signal changes over vertex [5]. Likewise, the total variation of the graph Fourier basis defined according to the adjacency matrix $\mathbf{F_{M}}$ could be used to characterize each frequency. Since GSP handles signals in the graph vertex domain, the total variation of GSP is defined as the differences between all the nodes and their neighbors, i.e.,

[TABLE]

where $\mathbf{F_{M}}^{norm}=\frac{1}{|\lambda_{max}|}\mathbf{F_{M}}$ . If the total variation of the frequency component $\mathbf{{f_{M}}}_{i}$ is larger, it means the change over the graph between neighborhood vertices is faster, which indicates a higher graph frequency. Note that, once the graph is undirected, i.e., the eigenvalues are real numbers, the frequency decreases with the increase of the eigenvalue similar as HGSP in Section III-E; otherwise, if the graph is undirected, i.e., the eigenvalues are complex, the frequency changes as shown in Fig. 12, which is consistency with the changing pattern of DSP frequency [5].

We now turn to our HGSP framework. Like GSP, HGSP analyzes signals in the hypergraph vertex domain. Different from normal graphs, each hyperege in HGSP connects more than two nodes. The neighbors of a vertex $\mathbf{v}_{i}$ include all the nodes in the hyperedges containing $\mathbf{v}_{i}$ . For example, if there exists a hyperedge $\mathbf{e}_{1}=\{\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\}$ , nodes $\mathbf{v}_{2}$ and $\mathbf{v}_{3}$ are both neighbors of node $\mathbf{v}_{1}$ . As we mentioned in Section III-E, the total variation of HGSP is defined as the difference between continuous signals over the hypergraph, i.e., the difference between the signal components and their respective shifted versions:

[TABLE]

where $\mathbf{F}^{norm}=\frac{1}{\lambda_{max}}\mathbf{F}$ . Similar to DSP and GSP, pairs of $(\lambda_{i},\mathbf{f}_{i})$ in Eq. (24) characterize the hypergraph spectrum space. A spectrum component with a larger total variation represents a higher frequency component, which indicates faster changes over the hypergraph. Note that, as we mentioned in Section III-E, the total variation is larger and the frequency is higher if the corresponding $\lambda$ is smaller because we usually talk about undirected hypergraph and the $\lambda$ ’s are real in the tensor decomposition. To illustrate it more clearly, we consider a hypergraph with $9$ nodes, $5$ hyperedges, and $m.c.e=3$ as an example, shown in Fig. 13. As we mentioned before, a smaller $\lambda$ indicates a higher frequency in HGSP. Hence, we see that the signals have more changes on each vertex if the frequency is higher.

IV-B Connections to other Existing Works

We now discuss the relationships between the HGSP and other existing works.

IV-B1 Graph Signal Processing

One of the motivations for developing HGSP is to develop a more general framework for signal processing in high-dimensional graphs. Thus, GSP should be a special case of HGSP. We illustrate the GSP-HGSP relationship as follows.

•

Graphical models: GSP is based on normal graphs[2], where each simple edge connects exactly two nodes; HGSP is based on hypergraphs, where each hyperedge could connect more than two nodes. Clearly, the normal graph is a special case of hypergraph, for which the $m.c.e$ equals two. More specifically, a normal graph is a $2$ -uniform hypergraph [60]. Hypergraph provides a more general model for multi-lateral relationships while normal graphs are only able to model bilateral relationship. For example, a $3$ -uniform hypergraph is able to model the trilateral interaction among users in a social network[61]. As hypergraph is a more general model for high-dimensional interactions, HGSP is also more powerful for high-dimensional signals.

•

Algebraic models: HGSP relies on tensors while GSP relies on matrices, which are second-order tensors. Benefiting from the generality of tensor, HGSP is broadly applicable in high-dimensional data analysis.

•

Signals and signal shifting: In HGSP, we define the hypergraph signal as $M-1$ times tensor outer product of the original signal. More specifically, the hypergraph signal is the original signal if $M=2$ . Basically, the hypergraph signal is the same as the graph signal if each hyperedge has exactly two nodes. Also shown in Fig. 10(b) of Section III-C, graph shifting is a special case of hypergraph shifting when $M=2$ .

•

Spectrum properties: In HGSP, the spectrum space is defined over the orthogonal-CP decomposition in terms of the basis and coefficients, which are also the E-eigenpairs of the representing tensor [62], shown in Eq. (27). In GSP, the spectrum space is defined as the matrix eigenspace. Since the tensor algebra is an extension of matrix, the HGSP spectrum is also an extension of the GSP spectrum. For example, as discussed in Section III, GFT is the same as HGFT when $M=2$ .

Overall, HGSP is an extension of GSP, which is both more general and novel. The purpose of developing the HGSP framework is to facilitate more interesting signal processing tasks that involve high-dimensional signal interactions.

IV-B2 Higher-Order Statistics

Higher-order statistics (HOS) has been effectively applied in signal processing[63, 64], which can analyze the multi-lateral interactions of signal samples and have found successes in many applications, such as blind feature detection [65], decision [66], and signal classifications [67]. In HOS, the $k$ th-order cumulant of random variables $\mathbf{x}=[x_{1},\cdots,x_{k}]^{\mathrm{T}}$ is defined [68] based on the coefficients of $\mathbf{v}=[v_{1},\cdots,v_{k}]^{\mathrm{T}}$ in the Talyor series expansion of cumulant-gernerating function, i.e.,

[TABLE]

It is easy to see that HGSP and HOS are related in high-dimensional signal processing. They can be both represented by tensor. For example, in the multi-channel problems of [69], the 3rd-order cumulant $\mathbf{C}=\{C_{y_{i},y_{j},y_{z}}(t,t_{1},t_{2})\}$ of zero-mean signals can be represented as a multilinear array, e.g.,

[TABLE]

which is essentially a third-order tensor. More specifically, if there are $k$ samples, the cumulant $\mathbf{C}$ can be represented as an $p^{k}$ -element vector, which is the flattened signal tensor similar to the $n$ -mode flattening of HGSP signals.

Although both HOS and HGSP are high-dimensional signal processing tools, they focus on complementary aspects of the signals. Specifically, HGSP aims to analyze signals over the high-dimensional vertex domain, while HOS focuses on the statistical domain. In addition, the forms of signal combination are also different, where HGSP signals are based on the hypergraph shifting defined as in Eq. (21), whereas HOS cumulants are based on the statistical average of shifted signal products.

IV-B3 Learning over Hypergraphs

Hypergraph learning is another tool to handle structured data and sometimes uses similar techniques to HGSP. For example, the authors of [70] proposed an alternative definition of hypergraph total variation and design algorithms in accordance for classification and clustering problems. In addition, hypergraph learning also has its own definition of the hypergraph spectrum space. For example, [39, 40] represented the hypergraphs using a graph-like similarity matrix and defined a spectrum space as the eigenspace of this similarity matrix. Other works considered different aspects of hypergraph, including the hypergraph Laplacian [71] and hypergraph lifting [21].

The HGSP framework exhibits features different from hypergraph learning:

HGSP defines a framework that generalizes the classical digital signal processing and traditional graph signal processing; 2) HGSP applies different definitions of hypergraph characteristics such as the total variation, spectrum space, and Laplacian; 3) HGSP cares more about the spectrum space while learning focuses more on data;
As HGSP is an extension of DSP and GSP, it is more suitable to handle detailed tasks such as compression, denoising, and detection. All these features make HGSP a different technical concept from hypergraph learning.

V Tools for Hypergrph Signal Processing

In this section, we introduce several useful tools built within the framework of HGSP.

V-A Sampling Theory

Sampling is an important tool in data analysis, which selects a subset of individual data points to estimate the characteristics of the whole population [89]. Sampling plays an important role in applications such as compression [27] and storage [90]. Similar to sampling signals in time, the HGSP sampling theory can be developed to sample signals over the vertex domain. We now introduce the basics of HGSP sampling theory for lossless signal dimension reduction.

To reduce the size of a hypergraph signal $\mathbf{s}^{[M-1]}$ , there are two main approaches: 1) to reduce the dimension of each order; and 2) to reduce the number of orders. Since the reduction of order breaks the structure of hypergraph and cannot always guarantee perfect recovery, we adopt the dimension reduction of each order. To change the dimension of a certain order, we can use the $n$ -Mode product. Since each order of the hypergraph signal is equivalent, the $n$ -Mode product operators of each order are the same. Then, the sampling operation of the hypergraph signal is defined as follows:

Definition 11 (Sampling and Interpolation).

Suppose that $Q$ is the dimension of each sampled order. The sampling operation is defined as

[TABLE]

where the sampling operator is $\mathbf{U}\in\mathbb{R}^{Q\times N}$ to be defined later, and the sampled signal is $\mathbf{s_{Q}^{[M-1]}}\in\mathbb{R}^{\underbrace{\scriptstyle{Q\times Q\times...\times Q}}_{{M-1}\ \rm{times}}}$ .

The interpolation operation is defined by

[TABLE]

where the interpolation operator is $\mathbf{T}\in\mathbb{R}^{N\times Q}$ to be defined later.

As presented in Section III, the hypergraph signal and original signal are different forms of the same data. They may have similar properties in structures. To derive the sampling theory for perfect signal recovery efficiently, we first consider the sampling operations of the original signal.

Definition 12 (Sampling original signal).

Suppose an original $K$ -bandlimited signal $\mathbf{s}\in\mathbb{R}^{N}$ is to be sampled into $\mathbf{s_{Q}}\in\mathbb{R}^{Q}$ , where $q=\{q_{1},\cdots,q_{Q}\}$ denotes the sequence of sampled indices and $q_{i}\in\{1,2,\cdots,N\}$ . The sampling operator $\mathbf{U}\in\mathbb{R}^{Q\times N}$ is a linearing mappling from $\mathbb{R}^{N}$ to $\mathbb{R}^{Q}$ , defined by

[TABLE]

and the interpolation operator $\mathbf{T}\in\mathbb{R}^{N\times Q}$ is a linear mapping from $\mathbb{R}^{Q}$ to $\mathbb{R}^{N}$ . Then, the sampling operation is defined by

[TABLE]

and the interpolation operation is defined by

[TABLE]

Analyzing the structure of the sampling operations, we have the following properties.

Theorem 2.

The hypergraph signal $\mathbf{s^{[M-1]}}$ shares the same sampling operator $\mathbf{U}\in\mathbb{R}^{Q\times N}$ and interpolation operator $\mathbf{T}\in\mathbb{R}^{N\times Q}$ with the original signal $\mathbf{s}$ .

Proof:

We first examine one of the orders in $n$ -Mode product of hypergraph signal, i.e., $n$ th-order of $\mathbf{s^{[M-1]}}$ , $1\leq n\leq N$ , as

[TABLE]

Since all elements in $\mathbf{s_{Q}^{[M-1]}}$ should also be the elements of $\mathbf{s^{[M-1]}}$ after sampling, only one $U_{ji_{n}}=1$ exists for each $j$ according to Eq. (50), i.e., only one term in the summation exists for each $j$ in the right part of Eq. (50). Moreover, since $\mathbf{U}$ samples over all the order, $U_{pi_{n}}=1$ and $U_{ji_{n}}=1$ cannot exist at the same time so that all the entries in $\mathbf{s_{Q}^{[M-1]}}$ are also in $\mathbf{s^{[M-1]}}$ . Suppose $q=\{q_{1},q_{2},\cdots,q_{Q}\}$ is the places of non-zero $U_{jq_{j}}$ ’s, we have

[TABLE]

As a result, we have $U_{ji}=\delta[i-q_{j}]$ , which is the same as the sampling operator for the original signal. For the interpolation operator, the proof is similar and hence omitted. ∎

Given Theorem 2, we only need to analyze the operations of the original signal in the sampling theory. Next, we discuss the conditions for perfect recovery. For the original signal, we have the following property.

Lemma 1.

Suppose that $\mathbf{s}\in\mathbb{R}^{N}$ is a $K$ -bandlimited signal. Then, we have

[TABLE]

where $\mathcal{F}^{\mathrm{T}}_{[K]}=[\mathbf{f}_{1},\cdots,\mathbf{f}_{K}]$ and $\mathbf{\tilde{s}}_{[K]}\in\mathbb{R}^{K}$ consists of the first $K$ elements of the original signal in the frequency domain, i.e., $\mathbf{\tilde{s}}$ .

Proof:

Since $\mathbf{s}$ is $K$ -bandlimited, $\mathbf{\tilde{s}}_{i}=\mathbf{f}_{i}^{\mathrm{T}}\mathbf{s}=0$ when $i>K$ . Then, according to Eq.(30), we have

[TABLE]

where $\mathbf{V}=[\mathbf{f}_{1},\cdots,\mathbf{f}_{N}]^{\mathrm{T}}$ . ∎

This lemma implies that the first $K$ frequency components carry all the information of the original signal. Since the hypergraph signal and the original signal share the same sampling operators, we can reach a similar conclusion for perfect recovery as [27, 28], given in the following theorem.

Theorem 3.

Define the sampling operator $\mathbf{U}\in\mathbb{R}^{Q\times N}$ according to $U_{ji}=\delta[i-q_{j}]$ where $1\leq q_{i}\leq N,\;i=1,\ \ldots,\ Q$ . By choosing $Q\geq K$ and the interpolation operator $\mathbf{T}=\mathcal{F}^{\mathrm{T}}_{[K]}\mathbf{Z}\in\mathbb{R}^{N\times Q}$ with $\mathbf{ZU}\mathcal{F}^{\mathrm{T}}_{[K]}=\mathbf{I_{K}}$ and $\mathcal{F}^{\mathrm{T}}_{[K]}=[\mathbf{f}_{1},\cdots,\mathbf{f}_{K}]$ , we can achieve a perfect recovery, i.e., $\mathbf{s=TUs}$ for all $K$ -bandlimited original signal $\mathbf{s}$ and the corresponding hypergraph signal $\mathbf{s}^{[M-1]}$ .

Proof:

To prove the theorem, we show that $\mathbf{TU}$ is a projection operator and $\mathbf{T}$ spans the space of the first $K$ eigenvectors. From Lemma 1 and $\mathbf{s=Ts_{Q}}$ , we have

[TABLE]

As a result, $rank(\mathbf{Zs_{Q}})=rank(\mathbf{\tilde{s}}_{[K]})=K.$ Hence, we conclude that $K\leq Q$ .

Next, we show that $\mathbf{TU}$ is a projection by proving that $\mathbf{TU\cdot TU=TU}.$ Since we have $Q\geq K$ and

[TABLE]

We have

[TABLE]

Hence, TU is a projection operator. For the spanning part, the proof is the same as that in [27]. ∎

Theorem 3 shows that a perfect recovery is possible for a bandlimited hypergraph signal. We now examine some interesting properties of the sampled signal.

From the previous discussion, we have $\mathbf{\tilde{s}}_{[K]}=\mathbf{Zs_{Q}}$ , which has a similar form to HGFT, where $\mathbf{Z}$ can be treated as the Fourier transform operator. Suppose that $Q=K$ and $\mathbf{Z}=[\mathbf{z}_{1}\quad\cdots\quad\mathbf{z}_{K}]^{\mathrm{T}}$ . We have the following first-order difference property.

Theorem 4.

Define a new hypergraph by $\mathbf{F_{K}}=\sum_{i=1}^{K}\lambda_{i}\cdot\mathbf{z}_{i}\circ\cdots\circ\mathbf{z}_{i}$ . Then, for all $K$ -bandlimited signal $\mathbf{s}^{[M-1]}\in\mathbb{R}^{\underbrace{\scriptstyle{N\times N\times...\times N}}_{\text{M times}}}$ , it holds that

[TABLE]

Proof:

Let the diagonal matrix $\Sigma_{[K]}$ consist of the first $K$ coefficients $\{\lambda_{1},\;\ldots,\;\lambda_{K}\}$ . Since $\mathbf{ZU}\mathcal{F}^{\mathrm{T}}_{[K]}=\mathbf{I_{K}}$ , we have

[TABLE]

Since $\mathbf{s}_{[K]}=\mathbf{Us}$ , it therefore holds that $\mathbf{s}_{[K]}-\mathbf{F_{K}s}^{[M-1]}_{[K]}=\mathbf{U}(\mathbf{s-Fs}^{[M-1]}).$ ∎

Theorem 4 shows that the sampled signals form a new hypergraph that preserves the information of the one-time shifting filter over the original hypergraph. For example, the left-hand side of Eq. (57) represent the difference between the sampled signal and the one-time shifted version in the new hypergraph. The right-hand side of Eq. (57) is the difference between a signal and its one-time shifted version in the original hypergraph, together with the sampling operator. That is, the sampled result of the one-time shifting differences in the original hypergraph is equal to the one-time shifting differences in the new sampled hypergraph.

V-B Filter Desgin

Filter is an important tool in signal processing applications such as denoising, feature enhancement, smoothing, and classification. In GSP, the basic filtering is defined as $\mathbf{s^{\prime}=F_{M}s}$ where $\mathbf{F_{M}}$ is the representing matrix[2]. In HGSP, the basic hypergraph filtering is defined in Section III-C as $\mathbf{s}_{(1)}=\mathbf{Fs}^{[M-1]}$ , which is designed according to the tensor contraction. The HGSP filter is a multilinear mapping [72]. The high-dimensionality of tensors provides more flexibility in designing the HGSP filter.

V-B1 Polynomial Filter based on Representing Tensor

Polynomial filter is one basic form of HGSP filters, with which signals are shifted several times over the hypergraph. An example of polynomial filter is given as Fig. 8 in Section III-B. A $k$ -time shifting filter is defined as

[TABLE]

More generally, a polynomial filter is designed as

[TABLE]

where $\{\alpha_{k}\}$ are the filter coefficients. Such HGSP filters are based on multilinear tensor contraction, which could be used for different signal processing tasks by selecting specific parameters $a$ and $\{\alpha_{i}\}$ .

In addition to the general polynomial filter based on hypergraph signals, we provide another specific form of polynomial filter based on the original signals. As mentioned in Section III-E, the supporting matrix $\mathbf{P_{s}}$ in Eq. (33) captures all the information of the frequency space. For example, the unnormalized supporting matrix $\mathbf{P}=\lambda_{max}\mathbf{P_{s}}$ is calculated as

[TABLE]

Obviously, the hypergraph spectrum pair $(\lambda_{r},\mathbf{f}_{r})$ is an eigenpair of the supporting matrix $\mathbf{P}$ . Moreover, Theorem 1 shows that the total variation of frequency component equals to a function of $\mathbf{P}$ , i.e.,

[TABLE]

From Eq. (62), $\mathbf{P}$ can be interpreted as a shifting matrix for the original signal. Accordingly, we can design a polynomial filter for the original signal based on the supporting matrix $\mathbf{P}$ whose $k$ th-order term is defined as

[TABLE]

The $a$ -th order polynomial filter is simply given as

[TABLE]

A polynomial filter over the original signal can be determined with specific choices of $a$ and $\alpha$ .

Let us consider some interesting properties of the polynomial filter for the original signal. First, given the $k$ th-order term, we have the following property as Lemma 2.

Lemma 2.

[TABLE]

Proof:

Let $\mathbf{V}^{\mathrm{T}}=[\mathbf{f}_{1},\cdots,\mathbf{f}_{N}]$ and $\Sigma=diag([\lambda_{1},\cdots,\lambda_{N}])$ . Since $\mathbf{V}^{\mathrm{T}}\mathbf{V}=\mathbf{I}$ , we have

[TABLE]

Therefore, the $k$ th-order term is given as

[TABLE]

∎

From Lemma 2, we obtain the following property of the polynomial filter for the original signal.

Theorem 5.

Let $h(\cdot)$ be a polynomial function. For the polynomial filter $\mathbf{H}=h(\mathbf{P})$ for the original signal, the filtered signal satisfies

[TABLE]

This theorem works as the invariance property of exponential in HGSP, similar to those in GSP and DSP [2]. Eq. (60) and Eq. (63) provide more choices for HGSP polynomial filters in hypergraph signal processing and data analysis. We will give specific examples of practical applications in Section VI.

V-B2 General Filter Design based on Optimization

In GSP, some filters are designed via optimization formulations [2, 73, 74]. Similarly, general HGSP filters can also be designed via optimization approaches. Assume $\mathbf{y}$ is the oberserved signal before shifting and $\mathbf{s}=h(\mathbf{F,y})$ is the shifted signal by HGSP filter $h(\cdot)$ designed for specific applications. Then, the filter design can be formulated as

[TABLE]

where $\mathbf{F}$ is the representing tensor of the hypergraph and $f(\cdot)$ is a penalty function designed for specific problems. For example, the total variation could be used as a penalty function for the purpose of smoothness. Other alternative penalty functions include the label rank, Laplacian regularization and spectrum. In Section VI, we shall provide some filter design examples.

VI Application Examples

In this section, we consider several application examples for our newly proposed HGSP framework. These examples illustrate the practical use of HGSP in some traditional tasks, such as filter design and efficient data representation. We also consider problems in data analysis, such as classification and clustering.

VI-A Data Compression

Efficient representation of signals is important in data analysis and signal processing. Among many applications, data compression attracts significant interests for efficient storage and transmission [75, 76, 77]. Projecting signals into a suitable orthonormal basis is a widely-used compression method [5]. Within the proposed HGSP framework, we propose a data compression method based on the hypergraph Fourier transform. We can represent $N$ signals in the original domain with $C$ frequency coefficients in the hypergraph spectrum domain. More specifically, with the help of the sampling theory in Section V, we can compress an $K$ -bandlimited signal of $N$ signal points losslessly with $K$ spectrum coefficients.

To test the performance of our HGSP compression and demonstrate that hypergraphs may be a better representation of structured signals than normal graphs, we compare the results of image compression with those from GSP-based compression method [5]. We test over seven small size- $16\times 16$ icon images and three size- $256\times 256$ photo images, shown in Fig. 14.

The HGSP-based image compression method is described as follows. Given an image, we first model it as a hypergraph with the Image Adaptive Neighborhood Hypergraph (IANH) model [30]. To reduce complexity, we pick three closest neighbors in each hyperedge to construct a third-order adjacency tensor. Next, we can calculate the Fourier basis of the adjacency tensor as well as the bandwidth $K$ of the hypergraph signals. Finally, we can represent the original images using $C$ spectrum coefficients with $C=K$ . For a large image, we may first cut it into smaller image blocks before applying HGSP compression to improve speed.

For the GSP-based method in [5], we represent the images as graphs with 1) the 4-connected neighbor model [31], and 2) the distance-based model in which an edge exists only if the spatial distance is below $\alpha$ and the pixel distance is below $\beta$ . The graph Fourier space and corresponding coefficients in the frequency domain are then calculated to represent the original image.

We use the compression ratio CR $=N/C$ to measure the efficiency of different compression methods. A large CR implies higher compression efficiency. The result is summarized in Table I, from which we can see that our HGSP-based compression method achieves higher efficiency than the GSP-based compression methods.

In addition to the image datasets, we also test the efficiency of HGSP spectrum compression over the MovieLens dataset [83], where each movie data point has rating scores and tags from viewers. Here, we treat scores of movies as signals and construct graph models based on the tag relationships. Similar to the game dataset shown in Fig. 2(b), two movies are connected in a normal graph if they have similar tags. For example, if movies are labeled with ‘love’ by users, they are connected by an edge. To model the dataset as a hypergraph, we include the movies into one hyperedge if they have similar tags. For convenience and complexity, we set $m.c.e=3$ . With the graph and hypergraph models, we compress the signals using the sampling method discussed earlier. For lossless compression, our HGSP method is able to use only $11.5\%$ of the samples from the original signals to recover the original dataset by choosing suitable additional basis (see Section III-F). On the other hand, the GSP method requires $98.6\%$ of the samples. We also test the error between the recovered and original signals based on varying numbers of samples. As shown in Fig. 15, the recovery error naturally decreases with more samples. Note that our HGSP method achieves a much better performance once it obtains sufficient number of samples, while GSP error drops slowly. This is due to the first few key HGSP spectrum basis elements carry most of the original information, thereby leading to a more efficient representation for structured datasets.

Overall, hypergraph and HGSP lead to more efficient descriptions of structured data in most applications. With a more suitable hypergraph model and more developed methods, the HGSP framework could be a very new important tool in data compression.

VI-B Spectral Clustering

Clustering problem is widely used in a variety of applications, such as social network analysis, computer vision, and communication problems. Among many methods, spectral clustering is an efficient clustering method [37, 6]. Modeling the dataset by a normal graph before clustering the data spectrally, significant improvement is possible in structured data[91]. However, such standard spectral clustering methods only exploit pairwise interactions. For applications where the interactions involve more than two nodes, hypergraph spectral clustering should be a more natural choice.

In hypergraph spectral clustering, one of the most important issues is how to define a suitable spectral space. In [39, 40], the authors introduced the hypergraph similarity spectrum for spectral clustering. Before spectral clustering, they first modeled the hypergraph structure into a graph-like similarity matrix. They then defined the hypergraph spectrum based on the eigenspace of the similarity matrix. However, since the modeling of hypergraph with a similarity matrix may result in certain loss of the inherent information, a more efficient spectral space defined directly over hypergraph is more desired as introduced in our HGSP framework. With HGSP, as the hypergraph Fourier space from the adjacency tensor has a similar form to the spectral space from adjacency matrix in GSP, we could develop the spectral clustering method based on the hypergraph Fourier space as in Algorithm 1.

To test the performance of the HGSP spectral clustering, we compare the achieved results with those from the hypergraph similarity method (HSC) in [40], using the zoo dataset[34]. To measure the performance, we compute the intra-cluster variance and the average Silhouette of nodes [41]. Since we expect the data points in the same cluster to be closer to each other, the performance is considered better if the intra-cluster variance is smaller. On the other hand, the Silhouette value is a measure of how similar an object is to its own cluster versus other clusters. A higher Silhouette value means that the clustering configuration is more appropriate.

The comparative results are shown in Fig. 16. Form the test result, we can see that our HGSP method generates a lower variance and a higher Silhouette value. More intuitively, we plot the clusters of animals in Fig. 17. Cluster 2 covers small animals like bugs and snakes. Cluster 3 covers carnivores whereas cluster 7 groups herbivores. Cluster 4 covers birds and Cluster 6 covers fish. Cluster 5 contains the rodents such as mice. One interesting category is cluster 1: although dolphins, sea-lions, and seals live in the sea, they are mammals and are clustered separately from cluster 6. From these results, we see that the HGSP spectral clustering method could achieve better performance and our definition of hypergraph spectrum may be more appropriate for spectral clustering in practice.

VI-C Classification

Classification problems are important in data analysis. Traditionally, these problems are studied by learning methods [35]. Here, we propose a HGSP-based method to solve the $\{\pm 1\}$ classification problem, where a hypergraph filter serves as a classifier.

The basic idea adopted for the classification filter design is label propagation (LP), where the main steps are to first construct a transmission matrix and then propagate the label based on the transmission matrix [36]. The label will converge after a sufficient number of shifting steps. Let $\mathbf{W}$ be the propagation matrix. Then the label could be determined by the distribution $\mathbf{s^{\prime}=W}^{k}\mathbf{s}$ . We see that $\mathbf{s^{\prime}}$ is in the form of filtered graph signal. Recall that in Section V-B, the supporting matrix $\mathbf{P}$ has been shown to capture the properties of hypergraph shifting and total variation. Here, we propose a HGSP classifier based on the supporting matrix $\mathbf{P}$ defined in Eq. (61) to generate matrix

[TABLE]

Our HGSP classifier is to simply rely on $\mbox{sign}[\mathbf{Hs}].$ The main steps of the propagated LP-HGSP classification method is described in Algorithm 2.

To test the performance of the hypergraph-based classifier, we implement them over the zoo datasets. We determine whether the animals have hair based on other features, formulated as a $\{\pm 1\}$ classification problem. We randomly pick different percentages of training data and leave the remaining data as the test set among the total 101 data points. We smooth the curve with 1000 combinations of randomly picked training sets. We compare the HGSP-based method against the SVM method with the RBF kernel and the label propagation GSP (LP-GSP) method [2]. In the experiment, we model the dataset as hypergraph or graph based on the distance of data. The threshold of determining the existence of edges is designed to ensure the absence of isolated nodes in the graph. For the label propagation method, we set $k=15$ . The result is shown in Fig. 18(a). From the result, we see that the label propagation HGSP method (LP-HGSP) is moderately better than LP-GSP. The graph-based methods, i.e., LP-GSP and LP-HGSP, both perform better than SVM. The performance of SVM appears less satisfactory, likely because the dataset is rather small. Model-based graph and hypergraph methods are rather robust when applied to such small datasets. To illustrate this effect more clearly, we tested the SVM and hypergraph performance with new configurations by the increasing dataset size and the fixing ratio of training data in Fig. 18(b). In the experiment, we first pick different sizes of data subsets from the original zoo dataset randomly as the new datasets. Then, with each size of the new dataset, $40\%$ data points are randomly picked as the training data, and the remaining data points are used as the test data. We average the results of 10000 times of experiments to smooth the curve. We can see from Fig. 18(b) that the performance of SVM shows significant improvement as the dataset size grows larger. This comparison indicates that SVM may require more data to achieve better performance, as shown in the comparative results of Fig. 18(a). Generally, the HGSP-based method exhibits better overall performance and shows significant advantages with small datasets. Although GSP and HGSP classifiers are both model-based, hypergraph-based ones usually perform better than graph-based ones, since hypergraphs provide a better description of the structured data in most applications.

VI-D Denoising

Signals collected in the real world often contain noises. Signal denoising is thus an important application in signal processing. Here, we design a hypergraph filter to implement signal denoising.

As mentioned in Section III, the smoothness of a graph signal, which describes the variance of hypergraph signals, could be measured by the total variation. Assume that the original signal is smooth. We formulate signal denoising as an optimization problem. Suppose that $\mathbf{y=s+n}$ is a noisy signal with noise $\mathbf{n}$ , and $\mathbf{s}^{\prime}=h(\mathbf{F,y})$ is the denoised data by the HGSP filter $h(\cdot)$ . The denoising problem could be formulated as an optimization problem:

[TABLE]

where the second term is the weighted quadratic total variation of the filtered signal $\mathbf{s^{\prime}}$ based on the supporting matrix.

The denoising problem of Eq. (71) aims to smooth the signal based on the original noisy data $\mathbf{y}$ . The first term keeps the denoised signal close to the original noisy signal, whereas the second term tries to smooth the recovered signal. Clearly, the optimized solution of filter design is

[TABLE]

where $\mathbf{P}_{s}=\sum_{i=1}^{N}\frac{\lambda_{i}}{\lambda_{\max}}\mathbf{f}_{i}\mathbf{f}_{i}^{\mathbf{T}}$ describes a hypergraph Fourier decomposition. From Eq. (72), we see that the solution is in the form of $\mathbf{s^{\prime}=Hy}$ for denoising, which adopts a hypergraph filter $h(\cdot)$ as

[TABLE]

The HGSP-based filter follows a similar idea to GSP-based denoising filter [33]. However, different definitions of the total variation and signal shifting result in different designs of HGSP vs. GSP filters. To test the performance, we compare our method with the basic Wiener filter, Median filter, and GSP-based filter [33] using the image datasets of Fig. 14. We apply different types of noises. To quantify the filter performance, we use the mean square error (MSE) between each true signal and the corresponding signal after filtering. The results are given in Table II. From these results, we can see that, for each type of noise and picking optimized $\gamma$ for all the methods, our HGSP-based filter out-performs other filters.

VI-E Other Potential Applications

In addition to the application algorithms discussed above, there could be many other potential applications for HGSP. In this subsection, we suggest several potential applicable datasets and systems for HGSP.

•

IoT: With the development of IoT techniques, the system structures become increasingly complex, which makes traditional graph-based tools inefficient to handle the high-dimensional interactions. On the other hand, the hypergraph-based HGSP is powerful in dealing with high-dimensional analysis in the IoT system: for example, data intelligence over sensor networks, where hypergraph-based analysis has already attracted significant attentions[92], and HGSP could be used to handle tasks like clustering, classification, and sampling.

•

Social Network: Another promising application is the analysis of social network datasets. As discussed earlier, a hyperedge is an efficient representation for the multi-lateral relationship in social networks [80, 15]; HGSP can then be effective in analyzing multi-lateral node interactions.

•

Nature Language Processing: Furthermore, natural language processing is an area that can benefit from HGSP. Modeling the sentence and language by hypergraphs [81, 82], HGSP can be a tool for language classification and clustering tasks.

Overall, due to its systematic and structural approach, HGSP is expected to become an important tool in handling high-dimensional signal processing tasks that are traditionally addressed by DSP or GSP based methods.

VII Conclusions

In this work, we proposed a novel tensor-based framework of Hypergraph Signal Processing (HGSP) that generalizes the traditional GSP to high-dimensional hypergraphs. Our work provided important definitions in HGSP, including hyerpgraph signals, hypergraph shifting, HGSP filters, frequency, and bandlimited signals. We presented basic HGSP concepts such as the sampling theory and filtering design. We show that hypergraph can serve as an efficient model for many complex datasets. We also illustrate multiple practical applications for HGSP in signal processing and data analysis, where we provided numerical results to validate the advantages and the practicality of the proposed HGSP framework. All the features of HGSP make it a powerful tool for IoT applications in the future.

Future Directions: With the development of tensor algebra and hypergraph spectra, more opportunities are emerging to explore HGSP and its applications. One interesting topic is how to construct the hypergraph efficiently, where distance-based and model-based methods have achieved significant successes in specific areas, such as image processing [93] and natural language processing [94]. Another promising direction is to apply HGSP in analyzing and optimizing multi-layer networks. As we discussed in the introduction, hypergraph is an alternative model to present the multi-layer network [13], and HGSP becomes a useful tool when dealing with multi-layer structures. Other future directions include the development of fast operations such as the fast hypergraph Fourier transform, and applications over high-dimensional datasets [95].

Bibliography95

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Sandryhaila, and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Transactions on Signal Processing , vol. 61, no. 7, pp. 1644-1656, Apr. 2013.
2[2] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: overview, challenges, and applications,” Proceedings of the IEEE , vol. 106, no. 5, pp. 808-828, Apr. 2018.
3[3] M. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,” Proceedings of the National Academy of Sciences , vol. 99, no. 1, pp. 2566-2572, Feb. 2002.
4[4] S. Barbarossa, and M. Tsitsvero, “An introduction to hypergraph signal processing,” in Proc of Acoustics, Speech and Signal Processing (ICASSP) , Shanghai, China, Mar. 2016, pp. 6425-6429.
5[5] A. Sandryhaila, and J. M. F. Moura, “Discrete signal processing on graphs: frequency analysis,” IEEE Transactions on Signal Processing , vol. 62, no. 12, pp. 3042-3054, Apr. 2014.
6[6] J. Shi, and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, no. 8, pp. 888–905, Aug. 2000.
7[7] R. Wagner, V. Delouille, and R. G. Baraniuk, “Distributed wavelet denoising for sensor networks,” in Proceedings of the 45th IEEE Conference on Decision and Control , San Diego, USA, Dec. 2006, pp. 373–379.
8[8] S. K. Narang, and A. Ortega, “Local two-channel critically sampled filter-banks on graphs,” in Proc. of 17th IEEE International Conference on Image Processing (ICIP) , Hong Kong, China, Sept. 2010, pp. 333-336.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A Overview of Graph Signal Processing

II-B Introduction of Hypergraph

Definition 1** (Hypergraph).**

II-C Tensor Basics

II-C1 Symmetric and Diagonal Tensors

II-C2 Tensor Operations

II-C3 Tensor Decomposition

II-C4 Tensor Spectrum

III Definitions for Hypergraph Signal Processing

III-A Algebraic Representation of Hypergraphs

Definition 2** (Adjacency tensor).**

Property 1**.**

Definition 3** (Laplacian tensor).**

III-B Hypergraph Signal and Signal Shifting

Definition 4** (Hypergraph signal).**

Definition 5** (Hypergraph vertex domain).**

Definition 6** (Hypergraph shifting).**

III-C Hypergraph Spectrum Space

Definition 7** (Hypergraph Fourier space and Fourier transform).**

Property 2**.**

III-D Relationship between Hypergraph Signal and Original Signal

Property 3**.**

III-E Hypergraph Frequency

Definition 8** (Total variation over hypergraph).**

Definition 9** (Hypergraph frequency).**

Theorem 1**.**

Proof:

Property 4**.**

III-F Signals with Limited Spectrum Support

Definition 10** (Bandlimited signal).**

Property 5**.**

III-G Implementation and Complexity

IV Discussions and Interpretations

IV-A Interpretation of Hypergraph Spectrum Space

IV-B Connections to other Existing Works

IV-B1 Graph Signal Processing

IV-B2 Higher-Order Statistics

IV-B3 Learning over Hypergraphs

V Tools for Hypergrph Signal Processing

V-A Sampling Theory

Definition 11** (Sampling and Interpolation).**

Definition 12** (Sampling original signal).**

Theorem 2**.**

Proof:

Lemma 1**.**

Proof:

Theorem 3**.**

Proof:

Theorem 4**.**

Proof:

V-B Filter Desgin

V-B1 Polynomial Filter based on Representing Tensor

Lemma 2**.**

Proof:

Theorem 5**.**

V-B2 General Filter Design based on Optimization

VI Application Examples

VI-A Data Compression

VI-B Spectral Clustering

VI-C Classification

VI-D Denoising

VI-E Other Potential Applications

VII Conclusions

Definition 1 (Hypergraph).

Definition 2 (Adjacency tensor).

Property 1.

Definition 3 (Laplacian tensor).

Definition 4 (Hypergraph signal).

Definition 5 (Hypergraph vertex domain).

Definition 6 (Hypergraph shifting).

Definition 7 (Hypergraph Fourier space and Fourier transform).

Property 2.

Property 3.

Definition 8 (Total variation over hypergraph).

Definition 9 (Hypergraph frequency).

Theorem 1.

Property 4.

Definition 10 (Bandlimited signal).

Property 5.

Definition 11 (Sampling and Interpolation).

Definition 12 (Sampling original signal).

Theorem 2.

Lemma 1.

Theorem 3.

Theorem 4.

Lemma 2.

Theorem 5.