Vector-valued Reproducing Kernel Banach Spaces with Group Lasso Norms

Liangzhi Chen; Haizhang Zhang; Jun Zhang

arXiv:1903.00819·math.FA·August 5, 2025

Vector-valued Reproducing Kernel Banach Spaces with Group Lasso Norms

Liangzhi Chen, Haizhang Zhang, Jun Zhang

PDF

Open Access

TL;DR

This paper develops a mathematical framework for vector-valued reproducing kernel Banach spaces with group lasso norms, enabling sparse multi-task learning with theoretical guarantees and new reproducing kernels.

Contribution

It introduces RKBSs with $ ext{l}_{p,1}$-norms supporting the linear representer theorem and proposes admissible reproducing kernels for sparse multi-task learning.

Findings

01

Established a theoretical foundation for RKBSs with group lasso norms.

02

Proved the support of the linear representer theorem in this setting.

03

Designed reproducing kernels suitable for sparse multi-task learning.

Abstract

Focusing on establishing a mathematical basis for kernel methods in sparse multi-task learning, we explore the theory of vector-valued reproducing kernel Banach spaces (RKBSs) endowed with $ℓ_{p, 1}$ -norms ( $1 \leq p \leq + \infty$ ), encompassing both the sparse learning case when $p = 1$ and the group lasso when $p = 2$ . We develop RKBSs equipped with these group lasso norms that support the linear representer theorem for regularized learning frameworks. Additionally, we introduce reproducing kernels admissible for this construction. Such reproducing kernels are applicable to sparse multi-task learning with group lasso norms.

Equations126

f \in F min {L (f (x), y) + λ Ω (f)},

f \in F min {L (f (x), y) + λ Ω (f)},

∥ A ∥_{L (E_{1}, E_{2})} := α \neq = 0 α \in E _{1} sup \frac{∥ A α ∥ _{E_{2}}}{∥ α ∥ _{E_{1}}} .

∥ A ∥_{L (E_{1}, E_{2})} := α \neq = 0 α \in E _{1} sup \frac{∥ A α ∥ _{E_{2}}}{∥ α ∥ _{E_{1}}} .

l_{p, 1} (Ω) := {C = (c_{t})_{t \in Ω} \in B_{p}^{Ω} : ∥ C ∥_{p, 1} = t \in Ω \sum ∥ c_{t} ∥_{p} < + \infty} .

l_{p, 1} (Ω) := {C = (c_{t})_{t \in Ω} \in B_{p}^{Ω} : ∥ C ∥_{p, 1} = t \in Ω \sum ∥ c_{t} ∥_{p} < + \infty} .

K [x] := [K (x_{i}, x_{j}) : i, j \in N_{m}]

K [x] := [K (x_{i}, x_{j}) : i, j \in N_{m}]

K^{x} (x) := (K (x_{i}, x) : i \in N_{m}) for every x \in X

K^{x} (x) := (K (x_{i}, x) : i \in N_{m}) for every x \in X

K_{x} (x) := (K (x, x_{i}) : i \in N_{m})^{T} for every x \in X .

K_{x} (x) := (K (x, x_{i}) : i \in N_{m})^{T} for every x \in X .

K (x, \cdot) c \in B, K (\cdot, x) c \in B^{#} for all x \in X, c \in B_{p};

K (x, \cdot) c \in B, K (\cdot, x) c \in B^{#} for all x \in X, c \in B_{p};

(f, K (\cdot, x) c)_{K} = ⟨ f (x), c ⟩_{q}, (K (x, \cdot) c, g)_{K} = ⟨ c, g (x) ⟩_{p}

(f, K (\cdot, x) c)_{K} = ⟨ f (x), c ⟩_{q}, (K (x, \cdot) c, g)_{K} = ⟨ c, g (x) ⟩_{p}

K [x] := [K (x_{i}, x_{j}) : k, j \in N_{m}] \in L (B_{p}, B_{q})^{m \times m}

K [x] := [K (x_{i}, x_{j}) : k, j \in N_{m}] \in L (B_{p}, B_{q})^{m \times m}

K [x] K^{'} [x] = diag (I_{q}, \dots, I_{q})_{m}

K [x] K^{'} [x] = diag (I_{q}, \dots, I_{q})_{m}

K^{'} [x] K [x] = diag (I_{p}, \dots, I_{p})_{m}

K^{'} [x] K [x] = diag (I_{p}, \dots, I_{p})_{m}

∥ K (x, x^{'}) ∥_{L (B_{p}, B_{q})} \leq κ

∥ K (x, x^{'}) ∥_{L (B_{p}, B_{q})} \leq κ

∥ K [x]^{- 1} K_{x} (x_{m + 1}) ∥_{p, 1} := c \neq = 0 c \in B _{p} sup \frac{∥ K [ x ] ^{- 1} K _{x} ( x _{m + 1} ) c ∥ _{p, 1}}{∥ c ∥ _{p}}

∥ K [x]^{- 1} K_{x} (x_{m + 1}) ∥_{p, 1} := c \neq = 0 c \in B _{p} sup \frac{∥ K [ x ] ^{- 1} K _{x} ( x _{m + 1} ) c ∥ _{p, 1}}{∥ c ∥ _{p}}

∥ B c ∥_{p, 1} = ∥ k = 1 \sum n B_{k} c_{k} ∥_{p, 1} \leq k = 1 \sum n ∥ B_{k} c_{k} ∥_{p, 1} \leq k = 1 \sum n ∥ B_{k} ∥_{p, 1} ∥ c_{k} ∥_{p} \leq k \in N_{n} max (∥ B_{k} ∥_{p, 1}) ∥ c ∥_{p, 1},

∥ B c ∥_{p, 1} = ∥ k = 1 \sum n B_{k} c_{k} ∥_{p, 1} \leq k = 1 \sum n ∥ B_{k} c_{k} ∥_{p, 1} \leq k = 1 \sum n ∥ B_{k} ∥_{p, 1} ∥ c_{k} ∥_{p} \leq k \in N_{n} max (∥ B_{k} ∥_{p, 1}) ∥ c ∥_{p, 1},

{\small\left[\begin{array}[]{cc}A&B\\ C&D\\ \end{array}\right]^{-1}=\left[\begin{array}[]{ll}A^{-1}+A^{-1}BMCA^{-1}&-A^{-1}BM\\ -MCA^{-1}&M\\ \end{array}\right]}

{\small\left[\begin{array}[]{cc}A&B\\ C&D\\ \end{array}\right]^{-1}=\left[\begin{array}[]{ll}A^{-1}+A^{-1}BMCA^{-1}&-A^{-1}BM\\ -MCA^{-1}&M\\ \end{array}\right]}

B_{K} := ⎩ ⎨ ⎧ x \in supp C \sum K (x, \cdot) c_{x} : C = (c_{x})_{x \in X} \in l_{p, 1} (X) ⎭ ⎬ ⎫

B_{K} := ⎩ ⎨ ⎧ x \in supp C \sum K (x, \cdot) c_{x} : C = (c_{x})_{x \in X} \in l_{p, 1} (X) ⎭ ⎬ ⎫

x \in supp C \sum K (x, \cdot) c_{x}_{B_{K}} := x \in supp C \sum ∥ c_{x} ∥_{p} .

x \in supp C \sum K (x, \cdot) c_{x}_{B_{K}} := x \in supp C \sum ∥ c_{x} ∥_{p} .

B_{K}^{#} := ⎩ ⎨ ⎧ x \in supp C \sum K (\cdot, x) c_{x} : C = (c_{x})_{x \in X} \in l_{p, 1} (X) ⎭ ⎬ ⎫

B_{K}^{#} := ⎩ ⎨ ⎧ x \in supp C \sum K (\cdot, x) c_{x} : C = (c_{x})_{x \in X} \in l_{p, 1} (X) ⎭ ⎬ ⎫

{\small\biggl{\|}\sum_{x\in\,{\rm supp}\,{\rm C}}{\bf K}(\cdot,x)c_{x}\biggr{\|}_{{\mathcal{B}}_{\bf K}^{\#}}:=\sup_{y\in X}\left\|\sum_{x\in\,{\rm supp}\,{\rm C}}{\bf K}(y,x)c_{x}\right\|_{q}.}

{\small\biggl{\|}\sum_{x\in\,{\rm supp}\,{\rm C}}{\bf K}(\cdot,x)c_{x}\biggr{\|}_{{\mathcal{B}}_{\bf K}^{\#}}:=\sup_{y\in X}\left\|\sum_{x\in\,{\rm supp}\,{\rm C}}{\bf K}(y,x)c_{x}\right\|_{q}.}

B_{K}^{0} := {i = 1 \sum m K (x_{i}, \cdot) c_{i} : x_{i} \in X, c_{i} \in B_{p}, i \in N_{m} for all m \in N}

B_{K}^{0} := {i = 1 \sum m K (x_{i}, \cdot) c_{i} : x_{i} \in X, c_{i} \in B_{p}, i \in N_{m} for all m \in N}

i = 1 \sum n K (x_{i}, \cdot) c_{i}_{B_{K}^{0}} := i = 1 \sum n ∥ c_{i} ∥_{p},

i = 1 \sum n K (x_{i}, \cdot) c_{i}_{B_{K}^{0}} := i = 1 \sum n ∥ c_{i} ∥_{p},

B_{K}^{0, #} := {i = 1 \sum m K (\cdot, x_{i}) c_{i} : x_{i} \in X, c_{i} \in B_{p}, i \in N_{m}, m \in N} .

B_{K}^{0, #} := {i = 1 \sum m K (\cdot, x_{i}) c_{i} : x_{i} \in X, c_{i} \in B_{p}, i \in N_{m}, m \in N} .

i = 1 \sum m K (x_{i}, \cdot) a_{i}, j = 1 \sum m^{'} K (\cdot, x_{j}^{'}) b_{j}_{K} = i = 1 \sum m j = 1 \sum m^{'} ⟨ a_{i}, K (x_{i}, x_{j}^{'}) b_{j} ⟩_{p},

i = 1 \sum m K (x_{i}, \cdot) a_{i}, j = 1 \sum m^{'} K (\cdot, x_{j}^{'}) b_{j}_{K} = i = 1 \sum m j = 1 \sum m^{'} ⟨ a_{i}, K (x_{i}, x_{j}^{'}) b_{j} ⟩_{p},

δ_{x} (f) = f (x), where f \in B_{K}^{0} or B_{K}^{0, #},

δ_{x} (f) = f (x), where f \in B_{K}^{0} or B_{K}^{0, #},

∥ δ_{x} (f) ∥_{q} \leq κ ∥ f ∥_{B_{K}^{0}}, for f \in B_{K}^{0},

∥ δ_{x} (f) ∥_{q} \leq κ ∥ f ∥_{B_{K}^{0}}, for f \in B_{K}^{0},

∥ f (x) ∥_{q} = ∥ c ∥_{p} \leq 1 sup ∣ ⟨ f (x), c ⟩_{q} ∣ = ∥ c ∥_{p} \leq 1 sup (i = 1 \sum n K (z_{i}, \cdot) a_{i}, K (\cdot, x) c)_{K} = ∥ c ∥_{p} \leq 1 sup i = 1 \sum n ⟨ a_{i}, K (z_{i}, x) c ⟩_{p} \leq ∥ c ∥_{p} \leq 1 sup i = 1 \sum n ∥ a_{i} ∥_{p} i \in N_{n} sup ∥ K (z_{i}, x) c ∥_{q} \leq ∥ f ∥_{B_{K}} ∥ c ∥_{p} \leq 1 sup i \in N_{n} sup ∥ K (z_{i}, x) ∥_{L (B_{p}, B_{q})} ∥ c ∥_{p} \leq κ ∥ f ∥_{B_{K}} .

∥ f (x) ∥_{q} = ∥ c ∥_{p} \leq 1 sup ∣ ⟨ f (x), c ⟩_{q} ∣ = ∥ c ∥_{p} \leq 1 sup (i = 1 \sum n K (z_{i}, \cdot) a_{i}, K (\cdot, x) c)_{K} = ∥ c ∥_{p} \leq 1 sup i = 1 \sum n ⟨ a_{i}, K (z_{i}, x) c ⟩_{p} \leq ∥ c ∥_{p} \leq 1 sup i = 1 \sum n ∥ a_{i} ∥_{p} i \in N_{n} sup ∥ K (z_{i}, x) c ∥_{q} \leq ∥ f ∥_{B_{K}} ∥ c ∥_{p} \leq 1 sup i \in N_{n} sup ∥ K (z_{i}, x) ∥_{L (B_{p}, B_{q})} ∥ c ∥_{p} \leq κ ∥ f ∥_{B_{K}} .

∥ g ∥_{B_{K}^{0#}} := f \neq = 0 f \in B _{K}^{0} sup \frac{∣ ( f , g ) _{K} ∣}{∥ f ∥ _{B_{K}^{0}}} .

∥ g ∥_{B_{K}^{0#}} := f \neq = 0 f \in B _{K}^{0} sup \frac{∣ ( f , g ) _{K} ∣}{∥ f ∥ _{B_{K}^{0}}} .

∥ g (x) ∥_{q} = ∥ c ∥_{p} \leq 1 sup ∣ ⟨ g (x), c ⟩_{q} ∣ = ∥ c ∥_{p} \leq 1 sup ∣ (K (x, \cdot) c, g)_{K} ∣ \leq ∥ g ∥_{B_{K}^{0, #}}

∥ g (x) ∥_{q} = ∥ c ∥_{p} \leq 1 sup ∣ ⟨ g (x), c ⟩_{q} ∣ = ∥ c ∥_{p} \leq 1 sup ∣ (K (x, \cdot) c, g)_{K} ∣ \leq ∥ g ∥_{B_{K}^{0, #}}

∥ g ∥_{B_{K}^{0, #}} = x \in X sup ∥ g (x) ∥_{q} .

∥ g ∥_{B_{K}^{0, #}} = x \in X sup ∥ g (x) ∥_{q} .

f (x) = i \in N_{n} \sum K (x_{i}, x) c_{i}, x \in X .

f (x) = i \in N_{n} \sum K (x_{i}, x) c_{i}, x \in X .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Mathematical Analysis and Transform Methods · Statistical Methods and Inference

Full text

Vector-valued Reproducing Kernel Banach Spaces with Group Lasso Norms*

††thanks: Supported by Natural Science Foundation of China under grants 11571377 and 11222103, and by DARPA/ARO under grant #W911NF-16-1-0383.

Liangzhi Chen

School of Data and Computer Science

*Sun Yat-sen University

*Guangzhou, China

[email protected]

Haizhang Zhang

*School of Data and Computer Science

Sun Yat-sen University

*Guangzhou, China

[email protected]

Jun Zhang

Department of Psychology,

and Department of Mathematics*

*University of Michigan

*Ann Arbor, MI 48109, USA

[email protected]

Abstract

Aiming at a mathematical foundation for kernel methods in coefficient regularization for multi-task learning, we investigate theory of vector-valued reproducing kernel Banach spaces (RKBS) with $L_{p,1}$ -norms, which contains the sparse learning scheme $p=1$ and the group lasso $p=2$ . We construct RKBSs that are equipped with such group lasso norms and admit the linear representer theorem for regularized learning schemes. The corresponding kernels that are admissible for the construction are discussed.

Index Terms:

vector-valued spaces, reproducing kernel Banach spaces, multi-task learning, the representer theorem

I Introduction

Learning theory focuses on finding good-performed predictors based on limited data. But solving such problems could often arise ill-posed problems [37, 27]. Regularization is a widely used method to deal with such phenomena. It is formulated as an optimization problem involves an error term and a regularizer. Consider the following optimal problem

[TABLE]

where ${\cal F}$ is a space of functions on some data set $X$ , $({\bf x},{\bf y})$ is a set of input/output data, $\lambda>0$ is a regularization parameter, $L$ is an error function and $\Omega$ is called the regularizer function.

Classical cases of (1) are regularized by Euclidian norms, or more general, Hilbertian norms. These have been thoroughly studied in the literature, [11, 30, 4]. Learning in reproducing kernel Hilbert spaces (RKHSs) have received considerable attentions over the past few decades in machine learning [3, 30, 31], statistical learning [40, 4] and stochastic process [28], etc. There are many reasons account for the success of learning methods in RKHSs. Firstly, kernels can be used to measure the similarity between input points due to the “kernel tricks”. Secondly, an RKHS is a Hilbert space of functions on $X$ for which point evaluations are continuous linear functionals. Sample data available for learning are usually modeled by point evaluations of the unknown target function. Finally, by the Riesz representation theorem, the point evaluation functionals on $X$ can be represented by its associated reproducing kernel. These facts lead to the celebrated representer theorem [19, 2], which is desirable for learning approach in high dimensional or infinite dimensional spaces.

However, it is difficult to enhance the performance of learning approaches in an RKHS due to its simple geometrical structure. Recently, theoretic work on learning in scalar-valued RKBSs [24, 46, 38, 43, 32, 35, 48, 34] and in the multi-task learning settings [5, 20, 25, 7, 8, 1, 47, 49] have been systematically studied. The work on $L_{1}$ -norm RKBSs [34] has caught much attention. This is due to that $L_{1}$ -norm regularization [36] in single-task learning problems often result in sparse solutions [6, 13, 39], which is desired in machine learning. Sparsity is essential for extracting relatively low dimensional features from sample data that usually live in high dimensional spaces.

Multi-task learning appear more often in applications. Methods based on single task learning techniques assume unnaturally that tasks are independent from each other, and usually tend to perform poorly for small data sets. By contrast, multi-task learning uses correlated information to improve the performance of the whole learning process. Many multi-task learning approaches have been proposed to boost the efficiency of lasso in coping such problems, such as, the smoothly clipped absolute deviation [15, 41], the adaptive lasso [50], the relaxed lasso [23], the group lasso [45] and the sparse group lasso [16, 33]. Numerical experiments in [9, 14, 23, 25, 16] show that the multi-task learning tends to provide better learning results than the single task learning.

The main task of this paper is to develop the learning theory for vector-valued RKBSs with the $L_{p,1}$ norms. When $p=1$ , this reduces to the $\ell^{1}$ -norm vector-valued RKBS recently studied in [20]. Our approach is more general and includes the important group lasso case when $p=2$ . Our first objective is to construct an $L_{p,1}$ -norm vector-valued RKBS based on admissible kernels, and then to derive the representer theorem for regularized learning schemes. These are the main contents of section 3 and 4. Our second objective focuses on the admissible kernels. In section 5, we give a family of new admissible kernels, and then discuss kernel functions with their Lebesgue constants bounded above by $1$ .

Before entering the subject of the paper, we make a list on former researches on RKBSs:

Scalar-valued RKBSs [46, 48] and vector-valued RKBSs [49] built on uniformly convex and uniformly smooth Banach spaces via semi-inner products [22]. 2. 2.

Scalar-valued RKBS with the $L_{1}$ -norm [34, 35]. 3. 3.

The $s$ -norm scalar-valued RKBSs [44] developed via dual-bilinear forms and the generalized Mercer kernels. 4. 4.

Vector-valued RKBSs with the $L_{1}$ -norm [20]. 5. 5.

Generic definitions and unified framework of construction of scalar-valued RKBSs [21].

II Preliminaries and Notations

Throughout this paper, $p$ always denotes a real number lies in the extended interval $[1,+\infty]$ , and $q$ is its conjugate number such that $1/p+1/q=1$ (if $p=1$ then $q=+\infty$ ; and if $p=+\infty$ then $q=1$ ). The notation $\mathbb{N}$ denotes the set of all positive integers and $\mathbb{N}_{k}:=\{1,2,\dots,k\}$ is defined for every $k\in\mathbb{N}$ . Let $\mathbb{C},\mathbb{R}$ and $\mathbb{R}_{+}$ be the sets of complex numbers, real numbers and nonnegative real numbers, respectively. For any Banach space, denote by ${\bf 0}$ its zero element.

For a Banach space ${\cal B}$ , denote its dual Banach space by ${\cal B}^{*}$ . When $p=2$ , denote ${\cal B}_{2}:=\ell^{2}$ be the classical countable infinite dimensional Hilbert space. When $p\neq 2$ , ${\cal B}_{p}$ is assumed to be a finite dimensional complex Euclidian space with the $\ell^{p}$ -norm. Note that ${\cal B}_{p}^{*}={\cal B}_{q}$ and ${\cal B}_{p}={\cal B}_{p}^{**}$ as for $p\neq 2$ , ${\cal B}_{p}$ has assumed to be finite-dimensional. Denote the bilinear form on ${\cal B}_{p}\times{\cal B}_{q}$ by $\langle\cdot,\cdot\rangle_{p}$ . Thus, for elements $x\in{\cal B}_{p},y\in{\cal B}_{q}$ , $\langle x,y\rangle_{p}:=y(x)=x(y)=:\langle y,x\rangle_{q}$ and $|\langle x,y\rangle_{p}|\leq\|x\|_{p}\|y\|_{q}$ .

Let ${\cal E}_{1},{\cal E}_{2}$ be two Banach spaces, then ${\cal L}({\cal E}_{1},{\cal E}_{2})$ denote the space of all bounded linear operators from ${\cal E}_{1}$ to ${\cal E}_{2}$ . Then ${\cal L}({\cal E}_{1},{\cal E}_{2})$ is also a Banach space. For any $A\in{\cal L}({\cal E}_{1},{\cal E}_{2})$ , its operator norm is defined by

[TABLE]

For any nonempty set $\Omega$ , we introduce

[TABLE]

Here, the set $\Omega$ might be uncountable, but this causes no trouble, as any element in $l_{p,1}$ has at most countable nonzero coordinates.

We denote the set of $m$ samplings in an input space $X$ by ${\bf x}=\{x_{i}\in X:i\in\mathbb{N}_{m}\}$ , and the corresponding observations by ${\bf y}=\{y_{i}\in{\cal B}_{q}:i\in\mathbb{N}_{m}\}$ . For later convenience, we introduce the following notation. Denote by

[TABLE]

an $m\times m$ matrix with entries in ${\cal L}({\cal B}_{p},{\cal B}_{q})$ . Its associated vectors are denoted by

[TABLE]

and

[TABLE]

II-A Reproducing kernel Banach spaces of vector-valued functions

Before giving a formal definition of RKBSs of vector-valued functions, we recall some terminologies.

Definition II.1.

[46]* A space ${\cal B}$ is called a Banach space of vector-valued functions if the point evaluation functionals are consistent with the norm on ${\cal B}$ in the sense that for all $f\in{\cal B}$ , $\|f\|_{\cal B}=0$ if and only if $f(x)={\bf 0}$ for every $x\in X$ . A Banach space ${\cal B}$ of vector-valued functions on $X$ is said to be a pre-RKBS on $X$ if point evaluations are continuous linear functionals on ${\cal B}$ . *

To accommodate the main purpose of this paper, we present a slightly different version of RKBSs of vector-valued functions from [49]. Denote a space ${\cal B}$ with the norm $\|\cdot\|_{\cal B}$ by $({\cal B},\|\cdot\|_{\cal B})$ .

Definition II.2.

The Banach spaces $({\cal B},\|\cdot\|_{\cal B})$ and $({\cal B}^{\#},\|\cdot\|_{{\cal B}^{\#}})$ are RKBSs of vector-valued functions from $X$ to ${\cal B}_{q}$ provided that

(i)

${\cal B}$ * and ${\cal B}^{\#}$ are pre-RKBS of vector-valued functions;* 2. (ii)

There exists a kernel function ${\bf K}:X\times X\to{\cal L}({\cal B}_{p},{\cal B}_{q})$ such that

[TABLE] 3. (iii)

In addition, the reproducing properties hold true in the sense that

[TABLE]

for all $x\in X,c\in{\cal B}_{p},f\in{\cal B},~{}g\in{\cal B}^{\#}$ .

*Under these assumptions, ${\bf K}$ is called the reproducing kernel of ${\cal B}$ and ${\cal B}^{\#}$ . *

II-B Admissible kernels

The requirements of a kernel function that can be used to construct a vector-valued RKBS with the $L_{p,1}$ -norm are formulated as follows.

Definition II.3 (Admissible Kernels).

A kernel ${\bf K}:X\times X\to{\cal L}({\cal B}_{p},{\cal B}_{q})$ is admissible for the construction of RKBS of vector-valued functions from $X$ to ${\cal B}_{q}$ endowed with the $L_{p,1}$ -norm if the following assumptions are satisfied.

(A1)

For any $m$ pairwise distinct sampling points ${\bf x}\subseteq X$ , the matrix

[TABLE]

is invertible in the sense that there exists a ${\bf K}^{\prime}[{\bf x}]\in{\cal L}({\cal B}_{q},{\cal B}_{p})^{m\times m}$ , such that

[TABLE]

and

[TABLE]

where $\mathbb{I}_{p}\in{\cal L}({\cal B}_{p},{\cal B}_{p})$ is the identity operator on ${\cal B}_{p}$ , and ${\rm diag}(\mathbb{I}_{p},\dots,\mathbb{I}_{p})_{m}$ is an $m\times m$ matrix with diagonal entries $\mathbb{I}_{p}$ and zero operator $\mathbb{O}$ elsewhere. We simply denote ${\bf K}^{\prime}[{\bf x}]$ by ${\bf K}[{\bf x}]^{-1}$ if no confusion is caused. 2. (A2)

The kernel ${\bf K}$ is bounded. That is, there exists $\kappa>0$ such that the operator norm

[TABLE]

for all $x,x^{\prime}\in X$ . 3. (A3)

For any pairwise distinct points $x_{i}\in X,i\in\mathbb{N}$ and $(c_{i})_{i\in\mathbb{N}}\in l_{p,1}(\mathbb{N})$ , if $\sum\limits_{i\in\mathbb{N}}{\bf K}(x_{i},x)c_{i}={\bf 0}$ for all $x\in X$ , then $c_{i}={\bf 0}$ for all $i\in\mathbb{N}$ . 4. (A4)

For any pairwise distinct $x_{1},x_{2},\dots,x_{m},x_{m+1}\in X$ ,

[TABLE]

is bounded above by $1$ , where ${\bf K}[{\bf x}]^{-1}{\bf K}_{\bf x}(x_{m+1})$ is a linear operator from ${\cal B}_{p}$ to ${\cal B}_{p}^{m}$ .

We denote the corresponding assumptions for the scalar case in [34] by (A1′)–(A4′).

We make some remarks on the assumption (A1) in the Definition II.3 below. Note that for $p<q$ , we have $\ell^{p}\subseteq\ell^{q}$ and there do exist two linear operators $A:\ell^{p}\to\ell^{q}$ , $B:\ell^{q}\to\ell^{p}$ such that $BA=\mathbb{I}_{p}$ and $\|AB\|_{{\cal L}(\ell_{q},\ell_{q})}=1$ . If both the linear operators $A,B$ are bounded, then most of the theoretic work in this paper would hold for ${\cal B}_{p}=\ell^{p}$ . But unfortunately, for $1\leq s\neq t\leq+\infty$ , there do not exist two bounded linear operators $A:\ell^{s}\to\ell^{t}$ , $B:\ell^{t}\to\ell^{s}$ , such that $AB=\mathbb{I}_{t}$ or $BA=\mathbb{I}_{s}$ . This is the main reason why we have to assume ${\cal B}_{p}~{}(p\neq 2)$ to be a finite-dimensional subspace of $\ell^{p}$ .

II-C Further preliminaries on matrix theory

We discuss some useful facts about the operator norm $\|\cdot\|_{p,1}$ defined in (A4). For an $m\times n$ operator matrix $B\in{\cal L}({\cal B}_{p},{\cal B}_{p})^{m\times n}$ and a vector ${\bf c}=(c_{1},c_{2},\dots,c_{n})\in{\cal B}_{p}^{n}$ , we have the following compatible inequality for $\|\cdot\|_{p,1}$ ,

[TABLE]

where $B_{k}$ denotes the $k$ -th column of $B$ . When the entries of ${\bf c}$ are scalar-valued, $\|{\bf c}\|_{p,1}=\|{\bf c}\|_{1}$ .

Also, the following inversion of a $2\times 2$ blockwise matrix will be used many times in this paper:

[TABLE]

where $M=(D-CA^{-1}B)^{-1}$ .

III Construction

To begin with, we will use a similar method as in [34] to construct vector-valued Banach space with the norm $\|\cdot\|_{p,1}$ based on a kernel satisfying (A2) and (A3) in Definition II.3.

Let $X$ be a given input space whose cardinality is infinite. We shall construct the following two RKBSs of vector-valued functions from $X$ to ${\cal B}_{q}$ . The first one is

[TABLE]

with the norm

[TABLE]

And the second one is

[TABLE]

with the norm

[TABLE]

III-A The bilinear form and point evaluations

Denote

[TABLE]

with the norm

[TABLE]

and a linear space

[TABLE]

The above two linear spaces both consist of functions from $X$ to ${\cal B}_{q}$ .

We then define a bilinear form $(\cdot,\cdot)_{{\bf K}}$ on ${\mathcal{B}}_{\bf K}^{0}\times{\mathcal{B}}_{\bf K}^{0,\#}$ by

[TABLE]

where $m,m^{\prime}\in\mathbb{N}$ and $x_{i},x^{\prime}_{j}\in X,~{}a_{i},b_{j}\in{\cal B}_{p}$ for $i\in\mathbb{N}_{m},j\in\mathbb{N}_{m^{\prime}}$ .

By (A3), we know that the norm in (8) and the above bilinear form in (9) are well-defined on their underlying spaces.

To proceed, we have to show that the point evaluation operators $\delta_{x}:{\mathcal{B}}_{\bf K}^{0}\to{\cal B}_{q},~{}x\in X$ or $\delta_{x}:{\mathcal{B}}_{\bf K}^{0,\#}\to{\cal B}_{q},~{}x\in X$ defined as follows

[TABLE]

are continuous operators.

Proposition III.1.

The point evaluation operators are continuous on ${\mathcal{B}}_{\bf K}^{0}$ in the sense that

[TABLE]

*where $\kappa>0$ is the constant in (A2). *

Proof.

Let $f=\sum_{i=1}^{n}{\bf K}(z_{i},\cdot)a_{i}\in{\mathcal{B}}_{\bf K}^{0}$ with $z_{i}\in X,~{}a_{i}\in{\cal B}_{p},i\in\mathbb{N}_{n}$ . Then we have

[TABLE]

This shows that the point evaluation operators are continuous on ${\mathcal{B}}_{\bf K}^{0}$ .

By [34, Proposition 2.4], we know that the norm defined as follows

[TABLE]

is well-defined. Moreover, by a similar reasoning as in Proposition III.1, we can show that the point evaluation operators on ${\mathcal{B}}_{\bf K}^{0,\#}$ are continuous and

[TABLE]

for every $g\in{\mathcal{B}}_{\bf K}^{0,\#}$ .

The norm defined as in (10) has another equivalent but simpler form.

Proposition III.2.

For any $g\in{\mathcal{B}}_{\bf K}^{0,\#}$ , it holds that

[TABLE]

Proof.

By (11), we have $\sup\limits_{x\in X}\|g(x)\|_{q}\leq\|g\|_{{\mathcal{B}}_{\bf K}^{0,\#}}$ . We shall prove the opposite direction. For any $f\in{\mathcal{B}}_{\bf K}^{0}$ , there exist pairwise distinct points $x_{i}\in X,~{}c_{i}\in{\cal B}_{p},~{}i\in\mathbb{N}_{n}$ such that

[TABLE]

Then, we have for every $g\in{\mathcal{B}}_{\bf K}^{0,\#}$ ,

[TABLE]

It follows that $\|g\|_{{\mathcal{B}}_{\bf K}^{\#}}\leq\sup\limits_{x\in X}\|g(x)\|_{q}$ , which completes the proof.

Until now, we have defined two normed vector spaces $({\mathcal{B}}_{\bf K}^{0},\|\cdot\|_{{\mathcal{B}}_{\bf K}^{0}})$ and $({\mathcal{B}}_{\bf K}^{0,\#},\|\cdot\|_{{\mathcal{B}}_{\bf K}^{0,\#}})$ , with their point evaluation functionals being continuous. There is also a bilinear form (9) defined on ${\mathcal{B}}_{\bf K}^{0}\times{\mathcal{B}}_{\bf K}^{0,\#}$ .

III-B Completion of ${\mathcal{B}}_{\bf K}^{0}$ and ${\mathcal{B}}_{\bf K}^{0,\#}$

With the previous preparations, we are now ready to complete ${\mathcal{B}}_{\bf K}^{0}$ and ${\mathcal{B}}_{\bf K}^{0,\#}$ . Just like the classical completion process, we simply add elements into ${\mathcal{B}}_{\bf K}^{0}$ and ${\mathcal{B}}_{\bf K}^{0,\#}$ to make them Banach spaces of functions. For convenience, we use the notation ${\cal N}_{0}$ to represent ${\mathcal{B}}_{\bf K}^{0}$ or ${\mathcal{B}}_{\bf K}^{0,\#}$ . Let $\{f_{n}:n\in\mathbb{N}\}$ be a Cauchy sequence in ${\cal N}_{0}$ . Then by Proposition III.1 and the fact that ${\cal B}_{q}$ is a Banach space, for any $x\in X$ , the sequence $\{f_{n}(x):n\in\mathbb{N}\}$ is convergent to some point in ${\cal B}_{q}$ . We denote this limit by $f(x)$ , which defines a vector-valued function $f:X\to{\cal B}_{q}$ . It is easy to see that $f$ is well-defined. We then let ${\cal N}$ be the set consist of all such limit vector-valued functions with the norm $\|f\|_{{\cal N}}=\lim\limits_{n\to\infty}\|f_{n}\|_{{\cal N}_{0}}$ . Here, ${\cal N}$ denote either ${\mathcal{B}}_{\bf K}$ or ${\mathcal{B}}_{\bf K}^{\#}$ .

Since the rest of the completion process is the same as in [34], we only have a quick review and conclude the followings without proof.

By Proposition III.1 and [34, Proposition 2.3 and 3.1], we have

[TABLE]

By Proposition III.2 and [34, Proposition 2.5 and Lemma 3.3], we have

[TABLE]

Moreover, the bilinear form could be extended uniquely to ${\mathcal{B}}_{\bf K}\times{\mathcal{B}}_{\bf K}^{\#}$ such that the reproducing property in Definition II.2 holds true. That is,

[TABLE]

for every $x\in X,c\in{\cal B}_{p},f\in{\mathcal{B}}_{\bf K},g\in{\mathcal{B}}_{\bf K}^{\#}$ .

We conclude the above discussion as follows.

Theorem III.3.

Let ${\bf K}:X\times X\to{\cal L}({\cal B}_{p},{\cal B}_{q})$ be a kernel function satisfying (A2) and (A3). Then the spaces ${\mathcal{B}}_{\bf K}$ and ${\mathcal{B}}_{\bf K}^{\#}$ , which are defined in (4) and (6) with their norm as in (5) and (7), respectively, satisfy

(i)

they are both RKBSs of vector-valued functions from $X$ to ${\cal B}_{q}$ with ${\bf K}$ being their reproducing kernel; 2. (ii)

the bilinear form (9) could be extended to ${\mathcal{B}}_{\bf K}\times{\mathcal{B}}_{\bf K}^{\#}$ , which satisfies the reproducing property (12) and

[TABLE]

for every $f\in{\mathcal{B}}_{\bf K},g\in{\mathcal{B}}_{\bf K}^{\#}$ .

IV The Representer Theorem

The linear representer theorem is very important in regularized learning schemes in machine learning. It enables us to transform the optimization problem in an infinite-dimensional space to an equivalent one in a finite-dimensional subspace. The representer theorem for the regularized learning schemes on RKBSs and for the minimal norm interpolations are often related [24, 2, 34].

Here in this section, we use the assumptions (A1), (A2) and (A4) in Definition II.3 to deduce a corresponding representer theorem for the constructed vector-valued RKBSs ${\mathcal{B}}_{\bf K}$ and ${\mathcal{B}}_{\bf K}^{\#}$ .

Recall that a linear operator between norm vector spaces $F:{\cal N}_{1}\to{\cal N}_{2}$ is said to be completely continuous [10] on ${\cal N}_{1}$ , if for any sequence $\{z_{k}\}\subseteq{\cal N}_{1}$ weakly convergent to $z_{0}\in{\cal N}_{1}$ , $F(z_{k})$ converges to $F(z_{0})$ strongly. Note that every linear compact operator is completely continuous. For example, the projection $P$ from an infinite dimensional Banach space to its finite dimensional subspace is completely continuous. We borrow the terminology from this definition for general vector-valued functionals on Banach spaces.

Definition IV.1 (Acceptable Regularized Learning Schemes).

Let ${\bf x}=\{x_{i}:i\in\mathbb{N}_{m}\}\subseteq X$ be the set of pairwise distinct sampling points. For $f\in{\mathcal{B}}_{\bf K}$ , denote $f({\bf x})=(f(x_{i}):i\in\mathbb{N}_{m})^{T}\in{\cal B}_{q}^{m}$ . Let $L:{\cal B}_{q}^{m}\times{\cal B}_{q}^{m}\to\mathbb{R}_{+}$ satisfy $L({\bf y},{\bf y})=0$ for any ${\bf y}\in{\cal B}_{q}^{m}$ . Let $\lambda>0$ and $\phi:\mathbb{R}_{+}\to\mathbb{R}_{+}$ be a nondecreasing function. A regularized learning scheme

[TABLE]

*is said to be acceptable in ${\mathcal{B}}_{\bf K}$ if $L$ is completely continuous on ${\cal B}_{q}^{m}\times{\cal B}_{q}^{m}$ , $\phi$ is continuous and $\lim\limits_{t\to\infty}\phi(t)=+\infty$ . *

Note that if the space ${\cal B}_{q}$ is a finite-dimensional vector space or the classical $\ell^{1}$ , then strongly continuity is equivalent to continuity.

Definition IV.2.

The space ${\mathcal{B}}_{\bf K}$ is said to satisfy the linear representer theorem for the acceptable regularized learning if every acceptable regularized learning scheme (14) has a minimizer of the form

[TABLE]

Denote

[TABLE]

One should be aware that although the space ${\cal S}^{\bf x}$ defined here is the “span” of $\{{\bf K}(x_{i},\cdot):i\in\mathbb{N}_{m}\}$ with their coefficient in ${\cal B}_{p}$ , but it may not be a finite-dimensional subspace of ${\mathcal{B}}_{\bf K}$ . That is why we impose the complete continuity on $L$ .

A minimal norm interpolant in ${\mathcal{B}}_{\bf K}$ with respect to $({\bf x},{\bf y})=\{(x_{i},y_{i}):i\in\mathbb{N}_{m}\}$ is a function $f_{\min}$ satisfying

[TABLE]

where ${\cal I}_{\bf x}({\bf y}):=\{f\in{\mathcal{B}}_{\bf K}:f({\bf x})={\bf y}\}$ . Without stated otherwise, we assume that $f_{\min}$ always exists.

Definition IV.3.

*The space ${\mathcal{B}}_{\bf K}$ is said to satisfy the linear representer theorem for minimal norm interpolation if for arbitrary choice of training data $\{(x_{i},y_{i}):i\in\mathbb{N}_{m}\}$ , there is a minimal norm interpolant $f_{\min}$ , obtained as in (16), lies in ${\cal S}^{\bf x}.$ *

Similar ideas and techniques as those in [34, Lemma 4.4 and 4.5] lead to the following theorem.

Theorem IV.4.

*The space ${\mathcal{B}}_{\bf K}$ satisfies the linear representer theorem for acceptable regularized learning if and only if ${\mathcal{B}}_{\bf K}$ satisfies the linear representer theorem for minimal norm interpolation. *

Hence, to consider connections between the assumption (A4) and the acceptable regularized learning scheme is equivalent to considering the connections between (A4) and the minimal norm interpolation problem. The advantage for finding such equivalence is that the minimal norm interpolation problem is much easier to deal with. The following lemma confirms this fact.

Lemma IV.5.

Let ${\bf x}=\{x_{1},x_{2},\dots,x_{m}\}$ consist of pairwise distinct elements in $X$ , $x_{m+1}\in X\setminus{\bf x}$ , and set $\overline{\bf x}={\bf x}\cup\{x_{m+1}\}$ . Then

[TABLE]

*for every ${\bf y}\subset{\cal B}_{q}^{m}$ if and only if ${\bf K}$ satisfies (***A4). **

Theorem IV.6.

*Every minimal norm interpolant of (16) in ${\mathcal{B}}_{\bf K}$ satisfies the linear representer theorem if and only if (A4) holds true. *

Proof.

We begin with the necessity. Note that the minimal norm interpolant of (16) satisfies the linear representer theorem if and only if

[TABLE]

Therefore, if the above equation holds true, then by the fact that ${\cal I}_{\bf x}({\bf y})\cap{\cal S}^{\bf x}\subseteq{\cal I}_{\bf x}({\bf y})\cap{\cal S}^{\overline{\bf x}}\subseteq{\cal I}_{\bf x}({\bf y})$ , we obtain (17) and by Lemma IV.5, the assumption (A4) holds true for every $x_{m+1}\in X\setminus{\bf x}$ .

Turning to the sufficiency, we notice

[TABLE]

To finish the proof we have to show that the reverse of the aforementioned inequality also holds true.

To this end, for any $g\in{\cal I}_{{\bf x}}({\bf y})\cap{\cal B}_{0}$ , we can express $g$ as $g=\sum\limits_{i=1}^{n}{\bf K}(x_{i},\cdot)c_{i}$ for some $n\geq m$ and pairwise distinct $x_{i}\in X,~{}c_{i}\in{\cal B}_{p},~{}i\in\mathbb{N}_{n}$ . This is true since we can always add extra samplings from $X\setminus{\bf x}$ by setting the corresponding coefficients $c_{i}$ to zero, and relabelling if necessary. Let $y_{j}=g(x_{j}):m+1\leq j\leq n$ and

[TABLE]

Note that ${\bf x}={\bf x}_{m}$ and ${\bf y}={\bf y}_{m}$ and $g\in{\cal I}_{{\bf x}_{n}}({\bf y}_{n})\cap{\cal S}^{{\bf x}_{n}}$ . Therefore we have

[TABLE]

Also, by Lemma IV.5 and the fact that ${\cal I}_{{\bf x}_{n}}({\bf y}_{n})\subseteq{\cal I}_{{\bf x}_{n-1}}({\bf y}_{n-1})$ ,

[TABLE]

Thus, we have

[TABLE]

Repeat this process until (18) holds true for $g\in{\cal I}_{{\bf x}}({\bf y})\cap{\cal B}_{0}$ .

For a general $g\in{\cal I}_{{\bf x}}({\bf y})$ , a limiting process would do the work. In fact, let $\{g_{k}\in{\cal B}_{0}:k\in\mathbb{N}\}$ be the sequence that converges to $g$ in ${\mathcal{B}}_{\bf K}$ . If we take $f,f_{k}\in{\cal S}^{\bf x}$ as follows

[TABLE]

Since $\|g_{k}-g\|_{{\mathcal{B}}_{\bf K}}\to 0$ as $k\to\infty$ and the point evaluation functionals are continuous on ${\mathcal{B}}_{\bf K}$ , $g_{k}(x_{i})\to g(x_{i})$ for $i\in\mathbb{N}_{m}$ as $k\to\infty$ . As a consequence,

[TABLE]

Since we already knew that $\|g_{k}\|_{{\mathcal{B}}_{\bf K}}\geq\|f_{k}\|_{{\mathcal{B}}_{\bf K}}$ for all $k\in\mathbb{N}$ , the inequality

[TABLE]

follows by taking the limit. The proof is complete.

Combining Theorem III.3 with Theorem IV.4 and IV.6, we have the following corollary for any $p\in[1,+\infty]$ .

Corollary IV.1.

Let ${\bf K}:X\times X\to{\cal L}({\cal B}_{p},{\cal B}_{q})$ satisfy (A1)-(A3) as in Definition II.3. Then it induces an RKBS ${\mathcal{B}}_{\bf K}$ and the following three statements are equivalent:

(a)

The kernel ${\bf K}$ satisfies the assumption (A4). 2. (b)

Every acceptable regularized learning scheme in ${\mathcal{B}}_{\bf K}$ of the form (14) has a minimizer with the form (15). 3. (c)

Every minimal norm interpolant (16) in ${\mathcal{B}}_{\bf K}$ satisfies the linear representer theorem.

We comment that if $\bf K$ satisfies (A4), then ${\mathcal{B}}_{\bf K}^{\#}$ also satisfies the linear representer theorem for the acceptable regularized learning. For more details, we recommend [34, Theorem 4.12 and Proposition 4.13].

We finish this section by stating the following conclusion.

Theorem IV.7.

If ${\bf K}$ is an admissible kernel on $X\times X$ , then ${\mathcal{B}}_{\bf K}$ and ${\mathcal{B}}_{\bf K}^{\#}$ as defined in Section 3, with their norms defined as in (5) and (7) respectively, are both vector-valued RKBSs on $X$ . And the bilinear form $(\cdot,\cdot)_{\bf K}$ satisfies (12) and the Cauchy inequality (13)

Furthermore, every acceptable regularized learning scheme as in Definition II.3, has a minimizer $f_{0}$ of the form

[TABLE]

for some $c_{i}\in{\cal B}_{p},i\in\mathbb{N}_{m}$ .

*The converse is also true. That is, for the constructed spaces ${\mathcal{B}}_{\bf K}$ and ${\mathcal{B}}_{\bf K}^{\#}$ to enjoy the above properties, ${\bf K}$ must be an admissible kernel on $X\times X$ . *

V Admissible Kernels

We have seen that admissible kernels are fundamental to our construction. We give examples of admissible kernels in this section.

Recall the term $\|{\bf K}[{\bf x}]^{-1}{\bf K}_{\bf x}(x)\|_{p,1}$ in (A4), which usually refers to the Lebesgue constant [18] of the kernel ${\bf K}$ that measures the stability of the kernel-based interpolation.

Define

[TABLE]

to be the Lebesgue constant of a kernel $\bf G:X\times X\to\mathbb{C}$ , where ${\bf w}$ is a finite subset of $X$ and $\|\cdot\|_{s}$ is some specified norm. For example, $s=2$ corresponds to the the classical Hilbert norm and $s=1$ to the $L_{1}$ -norm. We desire for kernels $\bf G$ such that

[TABLE]

It is shown in [34] that both the Brownian bridge kernel

[TABLE]

and the exponential kernel

[TABLE]

are admissible scalar-valued kernels. Here we present a new family of admissible scalar-valued kernels. We can then utilize these scalar-valued kernels to construct admissible operator-valued kernels for our purpose in [1, 7]:

[TABLE]

where $G:X\times X\to\mathbb{C}$ is a single-task kernel and $\mathbb{A}$ denotes a positive-definite matrix.

V-A A new family of admissible scalar-valued kernels

The new family is

[TABLE]

It contains the Brownian bridge kernel $K_{\min}$ when $t=1$ . When $t=0$ , it is the covariance of the Brownian motion $\min\{x,y\}$ .

Proposition V.1.

*The family of functions $K_{t}$ in (20) are admissible kernels. *

Proof.

Let $m\in N$ and $0<x_{1}<x_{2}<\cdots<x_{m}<1$ . An easy computation shows that the determinant of the kernel matrix $K_{t}[{\bf x}]$ is $x_{1}(1-tx_{m})(x_{2}-x_{1})(x_{3}-x_{2})\cdots(x_{m}-x_{m-1})$ . Then $K_{t}$ is strictly positive definite for any $-1\leq t\leq 1$ and therefore satisfies the assumption (A1′). The function $K_{t}$ is clearly uniformly bounded by $2$ for $t\in[-1,1]$ . Also, by the same reasoning as in [34, Proposition 5.1], we can verify that $K_{t}$ satisfies (A3′) and (A4′) for $t\in[-1,1]$ .

V-B Admissible kernel for multi-task learning

We will show that the multi-task kernel defined in (19) is admissible whenever $G$ is. Let $G$ be an scalar-valued kernel and $\mathbb{A}\in{\cal L}({\cal B}_{p},{\cal B}_{q})$ is an invertible operator as in (A2). Then we have the following lemma.

Lemma V.2.

*Let ${\bf G}:X\times X\to{\cal L}({\cal B}_{p},{\cal B}_{q})$ be a multi-task kernel given as in (19) and ${\bf x}$ be a set of $m$ pairwise distinct points. If the Lebesgue constant $\Lambda_{\bf x}^{p,1}(G)$ is bounded by $\alpha_{m}>0$ , then so is $\Lambda_{\bf x}^{p,1}({\bf G})$ . *

Proof.

We compute

[TABLE]

Then we have

[TABLE]

which completes the proof.

The following lemma follows directly from Lemma V.2.

Corollary V.1.

*Let $\bf G$ be defined as in Lemma V.2. If the Lebesgue constant $\Lambda_{\bf w}^{p,1}(G)$ is uniformly bounded by $\alpha>0$ for all pairwise distinct points ${\bf w}=\{w_{1},w_{2},\dots,w_{k}\},k\in\mathbb{N}$ , then $\Lambda^{p,1}({\bf G})\leq\alpha$ . *

The connections between the assumptions (A3′) and (A3) are stated as below.

Lemma V.3.

Let $K:X\times X\to\mathbb{C},$ be a kernel function satisfying (**A3′***), then for any invertible operator $\mathbb{A}\in{\cal L}({\cal B}_{p},{\cal B}_{q})$ , ${\bf K}=K\mathbb{A}$ satisfies (A3). *

Proof.

Let $\{x_{i}:i\in\mathbb{N}\}$ be pairwise distinct points in $X$ and $c_{i}\in{\cal B}_{p},~{}i\in\mathbb{N}$ . Suppose that $\sum\limits_{i\in\mathbb{N}}{\bf K}(x_{i},x)c_{i}={\bf 0}$ for every $x\in X$ . Then the sequence $\left\{\sum\limits_{i=1}^{k}{\bf K}(x_{i},x)c_{i}:k\in\mathbb{N}\right\}$ converges. Since $\sum\limits_{i=1}^{k}{\bf K}(x_{i},x)c_{i}=\mathbb{A}\sum\limits_{i=1}^{k}{K}(x_{i},x)c_{i}$ we have

[TABLE]

As a consequence, $\sum\limits_{i\in\mathbb{N}}{K}(x_{i},x)(c_{i})_{k}=0$ coordinately for every $k\in\mathbb{N}$ . Then we know that $(c_{i})_{k}=0$ for every $i,k\in\mathbb{N}$ . That is, $c_{i}=0,i\in\mathbb{N}$ .

It follows from the above Lemma that (A3) is automatically satisfied by the kernel with the form $K(x,x^{\prime})\mathbb{A}$ . Then we are now ready to present the following proposition.

Proposition V.4.

*Let $K$ be an admissible scalar-valued kernel, and $\mathbb{A}$ an invertible operator in ${\cal L}({\cal B}_{p},{\cal B}_{q})$ . Then ${\bf K}=K\mathbb{A}$ is also an admissible kernel. *

Proof.

Note that the assumption (A1) follows from the fact that $\mathbb{A}$ is invertible and $K$ strictly positive, and (A2) follows by $\|{\bf K}(x,x^{\prime})\|_{{\cal L}({\cal B}_{p},{\cal B}_{q})}\leq|K(x,x^{\prime})|\cdot\|\mathbb{A}\|_{{\cal L}({\cal B}_{p},{\cal B}_{q})}$ . By Lemma V.3, (A3) holds true provided that $K$ satisfies (A3′). Finally, by Lemma V.2 and (A4′), (A4) holds.

As a conclusion, we know that, for any invertible operator $\mathbb{A}\in{\cal L}({\cal B}_{p},{\cal B}_{q})$ ,

[TABLE]

are all admissible multi-task kernels.

V-C More admissible kernels

The Wendland’s kernel function in [42] has some well-behaved properties and is widely used in interpolation and kernel based learning problems. We consider restriction form of the Wendland’s function

[TABLE]

We are able to show that positive linear combinations of $K_{t}$ and $K_{w}$ have Lebesgue constants bounded above by $1$ .

We have the following result for $K_{w}$ .

Proposition V.5.

The Wendland kernel $K_{w}$ in (21) satisfies (**A4′***). *

Also, the positive linear combinations of $K_{t}$ and $K_{w}$ still have their Lebesgue constants being bounded by 1. Denote

[TABLE]

Proposition V.6.

The following class of kernel functions

[TABLE]

satisfies (**A4′***). *

The proof of the above proposition relies on much mathematics and is available in the full version of this paper on arXiv.

VI Conclusion

We established a theory for multi-task learning in vector-valued RKBS with $L_{p,1}$ -norms. These norms include the classical $L_{1}$ -norm and the group lasso norm. We explicitly construct the vector-valued RKBS by using admissible kernel functions. We prove that the representer theorem for acceptable learning schemes, the representer theorem for minimal norm interpolation, and the admissible assumption (A4) are all equivalent. As for admissible kernels, we present a new family of admissible scalar-valued kernels and based on which we construct admissible kernels for multi-task learning.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. A. Álvarez, L. Rosasco, and N. D. Lawrence, Kernels for vector-valued functions: A review, Found. Trends Mach. Learn. 4 (2012), 195-266.
2[2] A. Argyriou, C. A. Micchelli, and M. Pontil, When is there a representer theorem? Vector versus matrix regularizers, J. Mach. Learn. Res. 10 (2009), 2507-2529.
3[3] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), 337-404.
4[4] A. Berlinet, and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics , Kluwer Academic Publishers, Boston, MA, 2004.
5[5] J. Burbea, and P. Masani, Banach and Hilbert Spaces of Vector-valued Functions , Research Notes in Mathematics 90, Pitman Publishers, Boston, MA, 1984.
6[6] E. J. Cand e ` ` e {\rm\grave{e}} s, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory 52 (2006), 489-509.
7[7] A. Caponnetto, C. A. Micchelli, M. Pontil, and Y. Ying, Universal multi-task kernels, J. Mach. Learn. Res. 9 (2008), 1615-1646.
8[8] C. Carmeli, E. De Vito, A. Toigo, and V. Umanit a ` ` a {\rm\grave{a}} , Vector-valued reproducing kernel Hilbert spaces and universality, Anal. Appl. (Singap.) 8 (2010), 19-61.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Vector-valued Reproducing Kernel Banach Spaces with Group Lasso Norms*

Abstract

Index Terms:

I Introduction

II Preliminaries and Notations

II-A Reproducing kernel Banach spaces of vector-valued functions

Definition II.1**.**

Definition II.2**.**

II-B Admissible kernels

Definition II.3** (Admissible Kernels).**

II-C Further preliminaries on matrix theory

III Construction

III-A The bilinear form and point evaluations

Proposition III.1**.**

Proof.

Proposition III.2**.**

Proof.

III-B Completion of BK0{\mathcal{B}}_{\bf K}^{0}BK0​ and BK0,#{\mathcal{B}}_{\bf K}^{0,\#}BK0,#​

Theorem III.3**.**

IV The Representer Theorem

Definition IV.1** (Acceptable Regularized Learning Schemes).**

Definition IV.2**.**

Definition IV.3**.**

Theorem IV.4**.**

Lemma IV.5**.**

Theorem IV.6**.**

Proof.

Corollary IV.1**.**

Theorem IV.7**.**

V Admissible Kernels

V-A A new family of admissible scalar-valued kernels

Proposition V.1**.**

Proof.

V-B Admissible kernel for multi-task learning

Lemma V.2**.**

Proof.

Corollary V.1**.**

Lemma V.3**.**

Proof.

Proposition V.4**.**

Proof.

V-C More admissible kernels

Proposition V.5**.**

Proposition V.6**.**

VI Conclusion

Definition II.1.

Definition II.2.

Definition II.3 (Admissible Kernels).

Proposition III.1.

Proposition III.2.

III-B Completion of ${\mathcal{B}}_{\bf K}^{0}$ and ${\mathcal{B}}_{\bf K}^{0,\#}$

Theorem III.3.

Definition IV.1 (Acceptable Regularized Learning Schemes).

Definition IV.2.

Definition IV.3.

Theorem IV.4.

Lemma IV.5.

Theorem IV.6.

Corollary IV.1.

Theorem IV.7.

Proposition V.1.

Lemma V.2.

Corollary V.1.

Lemma V.3.

Proposition V.4.

Proposition V.5.

Proposition V.6.