Bochner integrals and neural networks
Paul C. Kainen, A. Vogt

TL;DR
This paper develops a functional analytic framework for neural networks using Bochner integrals, establishing new theoretical foundations and properties of variation spaces as Banach spaces.
Contribution
It introduces a Bochner integral formula for neural networks and analyzes the structure of variation spaces within a functional analytic context.
Findings
Variation spaces are Banach spaces.
Established norm inequalities relating pointwise and Bochner integrals.
Derived a Bochner integral formula representing functions via weights and parametrized functions.
Abstract
A Bochner integral formula is derived that represents a function in terms of weights and a parametrized family of functions. Comparison is made to pointwise formulations, norm inequalities relating pointwise and Bochner integrals are established, variation-spaces and tensor products are studied, and examples are presented. The paper develops a functional analytic theory of neural networks and shows that variation spaces are Banach spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Neural Networks and Applications · Image and Signal Denoising Methods
Bochner integrals and neural networks ††thanks: appeared in Handbook on Neural Information Processing, Monica Bianchini, Marco Maggini, Lakhmi C. Jain, Eds., Springer, ISRL Vol. 49, 2013, Chap. 6, pp. 183–214
Paul C. Kainen
Andrew Vogt (1943–2021)
Abstract
A Bochner integral formula is derived that presents a function in terms of weights and a parametrized family of functions , in . Comparison is made to pointwise formulations, norm inequalities relating pointwise and Bochner integrals are established, -variation and tensor products are studied, and examples are presented.
Keywords: Variational norm, essentially bounded, strongly measurable, Bochner integration, tensor product, spaces, integral formula.
1 Introduction
A neural network utilizes data to find a function consistent with the data and with further “conceptual” data such as desired smoothness, boundedness, or integrability. The weights for a neural net and the functions embodied in the hidden units can be thought of as determining a finite sum that approximates some function. This finite sum is a kind of quadrature for an integral formula that would represent the function exactly.
This chapter uses abstract analysis to investigate neural networks. Our approach is one of enrichment: not only is summation replaced by integration, but also numbers are replaced by real-valued functions on an input set , the functions lying in a function space . The functions, in turn, are replaced by -valued measurable functions on a measure space of parameters. The goal is to understand approximation of functions by neural networks so that one can make effective choices of the parameters to produce a good approximation.
To achieve this, we utilize Bochner integration. The idea of applying this tool to neural nets is in Girosi and Anzellotti [14] and we developed it further in Kainen and Kůrková [23]. Bochner integrals are now being used in the theory of support vector machines and reproducing kernel Hilbert spaces; see the recent book by Steinwart and Christmann [42], which has an appendix of more than 80 pages of material on operator theory and Banach-space-valued integrals. Bochner integrals are also widely used in probability theory in connection with stochastic processes of martingale-type; see, e.g., [8, 39]. The corresponding functional analytic theory may help to bridge the gap between probabilistic questions and deterministic ones, and may be well-suited for issues that arise in approximation via neural nets.
Training to replicate given numerical data does not give a useful neural network for the same reason that parrots make poor conversationalists. The phenomenon of overfitting shows that achieving fidelity to data at all costs is not desirable; see, e.g., the discussion on interpolation in our other chapter in this book (Kainen, Kurková, and Sanguineti [45]). In approximation, we try to find a function close to the data that achieves desired criteria such as sufficient smoothness, decay at infinity, etc. Thus, a method of integration which produces functions in toto rather than numbers could be quite useful.
Enrichment has lately been utilized by applied mathematicians to perform image analysis and even to deduce global properties of sensor networks from local information. For instance, the Euler characteristic, ordinarily thought of as a discrete invariant, can be made into a variable of integration [7]. In the case of sensor networks, such an analysis can lead to effective computations in which theory determines a minimal set of sensors [40].
By modifying the traditional neural net focus on training sets of data so that we get to families of functions in a natural way, we aim to achieve methodological insight. Such a framework may lead to artificial neural networks capable of performing more sophisticated tasks.
The main result of this chapter is Theorem 12 which characterizes functions to be approximated in terms of pointwise integrals and Bochner integrals, and provides inequalities that relate corresponding norms. The relationship between integral formulas and neural networks has long been noted; e.g., [20, 6, 37, 13, 34, 29] We examine integral formulas in depth and extend their significance to a broader context.
An earlier version of the Main Theorem, including the bounds on variational norm by the -norm of the weight function in a corresponding integral formula, was given in [23] and it also utilized functional (i.e., Bochner) integration. However, the version here is more general and further shows that if is a real-valued function on (the cartesian product of input and parameter spaces), then the associated map which maps the measure space to the Banach space defined by is measurable; cf. [42, Lemma 4.25, p. 125] where is the “feature map.”
Other proof techniques are available for parts of the Main Theorem. In particular, Kurková [28] gave a different argument for part (iv) of the theorem, using a characterization of variation via peak functionals [31] as well as the theorem of Mazur (Theorem 13.1) used in the proof of Lemma 3.4. But the Bochner integral approach reveals some unexpected aspects of functional approximation which may be relevant for neural network applications.
Furthermore, the treatment of analysis and topology utilizes a number of basic theorems from the literature and provides an introduction to functional analysis motivated by its applicability. This is a case where neural nets provide a fresh perspective on classical mathematics. Indeed, theoretical results proved here were obtained in an attempt to better understand neural networks.
An outline of the paper is as follows: In section 2 we discuss variational norms; sections 3 and 4 present needed material on Bochner integrals. The Main Theorem (Theorem 12) on integral formulas is given in Section 5. In section 6 we show how to apply the Main Theorem to an integral formula for the Bessel potential function in terms of Gaussians. In section 7 we show how this leads to an inequality involving Gamma functions and provide an alternative proof by classical means. Section 8 interprets and extends the Main Theorem in the language of tensor products. Using tensor products, we replace individual -valued ’s by families of such functions. This allows more nuanced representation of the function to be approximated. In section 9 we give a detailed example of concepts related to -variation, while section 10 considers the relationship between pointwise integrals and evaluation of the corresponding Bochner integrals. Remarks on future directions are in section 11, and the chapter concludes with two appendices and references.
2 Variational norms and completeness
We assume that the reader has a reasonable acquaintance with functional analysis but have attempted to keep this chapter self-contained. Notations and basic definitions are given in Appendix I, while Appendix II has the precise statement of several important theorems from the literature which will be needed in our development.
Throughout this chapter, all linear spaces are over the reals . For any subset of a linear space , , and ,
[TABLE]
Also, we sometimes use the abbreviated notation
[TABLE]
the standard notations on the right are explained in sections 12 and 4, resp. The symbol “” stands for “such that.”
A set in a normed linear space is fundamental (with respect to ) if , where closure depends only on the topology induced by the norm. We call bounded with respect to if
[TABLE]
We now review -variation norms. These norms, which arise in connection with approximation of functions, were first considered by Barron [5], [6]. He treated a case where is a family of characteristic functions of sets satisfying a special condition. The general concept, formulated by Kůrková [24], has been developed in such papers as [30, 26, 27, 15, 16, 17].
Consider the set
[TABLE]
This is a symmetric, closed, convex subset of , with Minkowski functional
[TABLE]
The subset of on which this functional is finite is given by
[TABLE]
If is bounded, then is a norm on . In general may be a proper subset of even if is bounded and fundamental w.r.t. . See the example at the end of this section. The inclusion is linear and for every
[TABLE]
Indeed, if , then is a convex combination of elements of -norm at most , so establishing (3) by definition of variational norm. Hence, if is bounded in , the operator is bounded with operator norm not exceeding .
Proposition 2.1
*Let nonempty a normed linear space. Then
(i) ;
(ii) is fundamental if and only if is dense in ;
(iii) For bounded and complete, is a Banach space.*
**Proof. ** (i) Let , then , for real numbers and . We assume the are not all zero since [math] is in . Then , where . Thus, is in . So and is in .
Likewise if is in , then for some , is in
[TABLE]
so is in .
(ii) Suppose is fundamental. Then by part (i). Conversely, if is dense in , then , and is fundamental.
(iii) Let be a Cauchy sequence in . By (3) is a Cauchy sequence in and has a limit in . The sequence is bounded in , that is, there is a positive number M such that for all n . Since is closed in , is also in . Hence and is in . Now given choose a positive integer such that for . In particular fix , and consider a variable integer . Then . So , and for all . But is closed in . Hence , and . So the sequence converges to in .
The following example illustrates several of the above concepts. Take to be a real separable Hilbert space with orthonormal basis . Let . Then
[TABLE]
Now is of the form where , and if , then for all and suitable . The minimal can be obtained by taking when , and when . It then follows that . Hence when is isomorphic to , is isomorphic to . As is fundamental, by part(ii) above, the closure of in is . This provides an example where is not a closed subspace of and so, while it is a Banach space w.r.t. the variational norm, it is not complete in the ambient-space norm.
3 Bochner integrals
The Bochner integral replaces numbers with functions and represents a broadranging extension, generalizing the Lebesgue integral from real-valued functions to functions with values in an arbitrary Banach space. Key definitions and theorems are summarized here for convenience, following the treatment in [44] (cf. [33]). Bochner integrals are used here (as in [23]) in order to prove a bound on variational norm.
Let be a measure space. Let be a Banach space with norm . A function is simple if it has a finite set of nonzero values , each on a measurable subset of with , , and the are pairwise-disjoint. Equivalently, a function is simple if it can be written in the following form:
[TABLE]
where denotes the constant function with value and denotes the characteristic function of a subset of . This decomposition is nonunique and we identify two functions if they agree -almost everywhere - i.e., the subset of on which they disagree has -measure zero.
Define an -valued function on the simple functions by setting for of form (4)
[TABLE]
This is independent of the decomposition of [44, pp.130–132]. A function is strongly measurable (w.r.t. ) if there exists a sequence of simple functions such that for -a.e.
[TABLE]
A function is Bochner integrable (with respect to ) if it is strongly measurable and there exists a sequence of simple functions such that
[TABLE]
If is strongly measurable and (5) holds, then the sequence is Cauchy and by completeness converges to an element in . This element, which is independent of the sequence of simple functions satisfying (5), is called the Bochner integral of (w.r.t. ) and denoted
[TABLE]
Let denote the linear space of all strongly measurable functions from to which are Bochner integrable w.r.t. ; let be the corresponding set of equivalence classes (modulo -a.e. equality). It is easily shown that equivalent functions have the same Bochner integral. Then the following elegant characterization holds.
Theorem 3.1** (Bochner)**
Let be a Banach space and a measure space. Let be strongly measurable. Then
[TABLE]
A consequence of this theorem is that is a continuous linear operator and
[TABLE]
In particular, the Bochner norm of , , is , where is a simple function satisfying (4).
For a measure space and a Banach space, is weakly measurable if for every continuous linear functional on the composite real-valued function is measurable [43, pp. 130–134]. If is measurable, then it is weakly measurable since measurable followed by continuous is measurable: for open in , .
Recall that a topological space is separable if it has a countable dense subset. Let denote Lebesgue measure on and let be -measurable, . Then is separable when ; e.g., [36, pp. 208]. A function is -almost separably valued (-a.s.v.) if there exists a -measurable subset with and is a separable subset of .
Theorem 3.2** (Pettis)**
Let be a Banach space and a measure space. Suppose . Then is strongly measurable if and only if is weakly measurable and -a.s.v.
The following basic result (see, e.g., [9]) was later extended by Hille to the more general class of closed operators. But we only need the result for bounded linear functionals, in which case the Bochner integral coincides with ordinary integration.
Theorem 3.3
Let be a measure space, let , be Banach spaces, and let . If is a bounded linear operator, then and
[TABLE]
There is a mean-value theorem for Bochner integrals (Diestel and Uhl [12, Lemma 8, p. 48]). We give their argument with a slightly clarified reference to the Hahn-Banach theorem.
Lemma 3.4
Let be a finite measure space, let be a Banach space, and let be Bochner integrable w.r.t. . Then
[TABLE]
**Proof. ** Without loss of generality, . Suppose . By a consequence of the Hahn-Banach theorem given as Theorem 13.1 in Appendix II below), there is a continuous linear functional on such that . Hence, by Theorem 3.3,
[TABLE]
which is absurd.
4 Spaces of Bochner integrable functions
In this section, we derive a few consequences of the results from the previous section which we shall need below.
A measurable function from a measure space to a normed linear space is called essentially bounded (w.r.t. ) if there exists a -null set for which
[TABLE]
Let denote the linear space of all strongly measurable, essentially bounded functions from to . Let be its quotient space mod the relation of equality -a.e. This is a Banach space with norm
[TABLE]
To simplify notation, we sometimes write for Note that if , then for -a.e. . Indeed, for positive integers , for not in a set of measure zero so for not in the union also a set of measure zero.
We also have a useful fact whose proof is immediate.
Lemma 4.1
For every measure space and Banach space , the natural map associating to each element the constant function from to given by for all in is an isometric linear embedding.
Lemma 4.2
Let be a separable Banach space, let be a measure space, and let and be -measurable functions. Then is strongly measurable.
**Proof. ** By definition, is the function from to defined by
[TABLE]
where the multiplication is that of a Banach space element by a real number. Then is measurable because it is obtained from a pair of measurable functions by applying scalar multiplication which is continuous. Hence, by separability, Pettis’ Theorem 3.2, and the fact that measurable implies weakly measurable, we have strong measurability for (cf. [33, Lemma 10.3]).
If is a finite measure space, is a Banach space, and is strongly measurable and essentially bounded, then is Bochner integrable by Theorem 3.1. The following lemma, which follows from Lemma 4.2, allows us to weaken the hypothesis on the function by further constraining the space .
Lemma 4.3
Let be a finite measure space, a separable Banach space, and be -measurable and essentially bounded w.r.t. . Then and
[TABLE]
Let , and let be defined for -measurable by . For , .
Theorem 4.4
Let be a measure space, a separable Banach space; let be nonzero -a.e., let be the measure defined above, and let be -measurable. If one of the Bochner integrals
[TABLE]
exists, then both exist and are equal.
**Proof. ** By Lemma 4.2, both and are strongly measurable. Hence, by Theorem 3.1, the respective Bochner integrals exist if and only if the -norms of the respective integrands have finite ordinary integral. But
[TABLE]
so the Bochner integral exists exactly when does. Further, the respective Bochner integrals are equal since for any continuous linear functional in , by Theorem 3.3
[TABLE]
Corollary 4.5
Let be a -finite measure space, a separable Banach space, be in and be in . Then is Bochner integrable w.r.t. .
**Proof. ** By Lemma 4.2, is strongly measurable, and Lemma 4.3 then implies that the Bochner integral exists since . So is Bochner integrable by Theorem 4.4.
5 Main theorem
In the next result, we show that certain types of integrands yield integral formulas for functions in a Banach space of -type both pointwise and at the level of Bochner integrals. Furthermore, the variational norm of is shown to be bounded by the -norm of the weight function from the integral formula. Equations (9) and (10) and part (iv) of this theorem were derived in a similar fashion by one of us with Kůrková in [23] under more stringent hypotheses; see also [13, eq. (12)].
Theorem 5.1
*Let , be -finite measure spaces, let be in , let , , be separable, let be -measurable, let be defined for each in by for -a.e. and suppose that for some , for -a.e. . Then the following hold:
*(i) For -a.e. , the integral exists and is finite.
(ii) The function defined by
[TABLE]
is in and its equivalence class, also denoted by , is in and satisfies
[TABLE]
(iii) The function is measurable and hence in , and is the Bochner integral of w.r.t. , i.e.,
[TABLE]
(iv) For , is in , and
[TABLE]
and as in (1)
[TABLE]
**Proof. ** (i) Consider the function . This is a well-defined -measurable function on . Furthermore its repeated integral
[TABLE]
exists and is bounded by since and for a. e. y. and . By Fubini’s Theorem 13.2 the function is in for a.e. x. But the inequality
[TABLE]
shows that the function is dominated by the sum of two integrable functions. Hence the integrand in the definition of is integrable for a. e. x, and is well-defined almost everywhere.
(ii) The function is a convex function for . Accordingly by Jensen’s inequality (Theorem 13.3 below),
[TABLE]
provided both integrals exist and is a probability measure on the measurable space . We take to be defined by the familiar formula:
[TABLE]
for -measurable sets A in Y, so that integration with respect to reduces to a scale factor times integration of . Since we have established that both and are integrable with respect to for a.e. x, we obtain:
[TABLE]
[TABLE]
for a.e. x. But we can now integrate both side with respect to over because of the integrability noted above in connection with Fubini’s Theorem. Thus and , again interchanging order.
(iii) First we show that of the open ball centered at of radius , , is a -measurable subset of for each in and . Note that
[TABLE]
for all in where and are -measurable functions representing the elements and belonging to . Since is -finite, we can find a strictly positive function in . (For example, let , where is a countable disjoint partition of into -measurable sets of finite measure.) Then is a -measurable function on , and
[TABLE]
By Fubini’s Theorem 13.2, is -measurable. Since is -measurable and strictly positive, is also -measurable and so is measurable. Hence, is measurable. Thus, is essentially bounded, with essential sup . (In (9), can be replaced by this essential sup.)
By Corollary 4.5, is Bochner integrable. To prove that is the Bochner integral, using Theorem 4.4, we show that for each bounded linear functional , . By the Riesz representation theorem [35, p. 316], for any such there exists a (unique) , , such that for all , . By Theorem 3.3,
[TABLE]
But for , , so
[TABLE]
Also, using (8),
[TABLE]
The integrand of the iterated integrals is measurable with respect to the product measure , so by Fubini’s Theorem the iterated integrals are equal provided that one of the corresponding absolute integrals is finite. Indeed,
[TABLE]
By Hölder’s inequality, for every ,
[TABLE]
using the fact that . Therefore, by the essential boundedness of w.r.t. , the integrals in (13) are at most
[TABLE]
Hence, is the Bochner integral of w.r.t. .
(iv) We again use Lemma 3.4. Let be a measurable subset of with and for , ; see the remark following the definition of essential supremum. But restricting and to , one has
[TABLE]
hence, . Thus, .
6 An example involving the Bessel potential
Here we review an example related to the Bessel functions which was considered in [21] for . In the following section, this Bessel-potential example is used to find an inequality related to the Gamma function.
Let denote the Fourier transform, given for and by
[TABLE]
where is Lebesgue measure and means . For , let
[TABLE]
Since the Fourier transform is an isometry of onto itself (Parseval’s identity), and is in for (which we now assume), there is a unique function , called the Bessel potential of order , having as its Fourier transform. See, e.g., [2, p. 252]. If and , then and
[TABLE]
Indeed, by radial symmetry, , where and is the area of the unit sphere in [11, p. 303]. Substituting and , and using [10, p. 60], we find that
[TABLE]
establishing (14).
For , let denote the scaled Gaussian . A simple calculation shows that the -norm of :
[TABLE]
Indeed, using , we obtain:
[TABLE]
We now express the Bessel potential as an integral combination of Gaussians. The Gaussians are normalized in and the corresponding weight function is explicitly given. The integral formula is similar to one in Stein [41]. By our main theorem, this is an example of (8) and can be interpreted either as a pointwise integral or as a Bochner integral.
Proposition 6.1
For a positive integer, , , and
[TABLE]
where
[TABLE]
and
[TABLE]
**Proof. ** Let
[TABLE]
Putting and , we obtain
[TABLE]
Using the norm of the Gaussian (15), we arrive at
[TABLE]
which is the result desired.
Now we apply Theorem 12 with and to bound the variational norm of by the -norm of the weight function.
Proposition 6.2
For a positive integer, , and ,
[TABLE]
where and .
**Proof. ** By (11) and Proposition 6.1, we have
[TABLE]
where , and by definition, the integral is .
7 Application: A Gamma function inequality
The inequalities among the variational norm , the Banach space norm , and the -norm of the weight function, established in the Main Theorem, allow us to derive other inequalities. The Bessel potential of order considered above provides an example.
Let be a positive integer, , and . By Proposition 6.2 and (14) of the last section, and by (12) of the Main Theorem, we have
[TABLE]
Hence, with and , this becomes
[TABLE]
In fact, (17) holds if satisfy (i) and (ii) for some and . As , . If for some , then , so there always exist , satisfying (ii); the smallest such is .
The inequality (17) suggests that the Main Theorem can be used to establish other inequalities of interest among classical functions. We now give a direct argument for the inequality. Its independent proof confirms our function-theoretic methods and provides additional generalization.
We begin by noting that in (17) it suffices to take . If the inequality is true in that case, it is true for all real numbers . Thus, we wish to establish that
[TABLE]
is a strictly increasing function of for and . (For this function is constant.)
Equivalently, we show that
[TABLE]
is a strictly increasing function of for and .
Differentiating with respect to , we obtain:
[TABLE]
where is the digamma function. It suffices to establish that for , . Note that . Now consider
[TABLE]
This derivative is positive if and only if for , .
It remains to show that for . Using the power series for [1, 6.4.10], we have for ,
[TABLE]
8 Tensor-product interpretation
The basic paradigm of feedforward neural nets is to select a single type of computational unit and then build a network based on this single type through a choice of controlling internal and external parameters so that the resulting network function approximates the target function; see [45]. However, a single type of hidden unit may not be as effective as one based on a plurality of hidden-unit types. Here we explore a tensor-product interpretation which may facilitate such a change in perspective.
Long ago Hille and Phillips [19, p. 86] observed that the Banach space of Bochner integrable functions from a measure space into a Banach space has a fundamental set consisting of two-valued functions, achieving a single non-zero value on a measurable set of finite measure. Indeed, every Bochner integrable function is a limit of simple functions, and each simple function (with a finite set of values achieved on disjoint parts of the partition) can be written as a sum of characteristic functions, weighted by members of the Banach space. If is such a simple function, then
[TABLE]
where the are the characteristic functions of the and the are in . (If, for example, is embedded in a finite-dimensional Euclidean space, the partition could consist of generalized rectangles.)
Hence, if is the Bochner integral of with respect to some measure , then can be approximated as closely as desired by elements in of the form
[TABLE]
where is a -measurable partition of .
Note that given a -finite measure space and a separable Banach space , every element in is (trivially) the Bochner integral of any integrand , where is a nonnegative function on with (see part (iii) of Theorem 12) and denotes the constant function on with value . In effect, is in when . When is chosen first (or more precisely as in our Main Theorem), then may or may not be in . According to the Main Theorem, is in when it is given by an integral formula involving and some weight function. In this case, where is the ball in of radius .
In general, the elements of the Banach space involved in some particular approximation for will be distinct functions of some general type obtained by varying the parameter . For instance, kernels, radial basis functions, perceptrons, or various other classes of computational units can be used, and when these computational-unit-classes determine fundamental sets, by Proposition 2.1, it is possible to obtain arbitrarily good approximations. However, Theorem 8.1 below suggests that having a finite set of distinct types may allow a smaller “cost” for approximation, if we regard
[TABLE]
as the cost of the approximation
[TABLE]
We give a brief sketch of the ideas, following Light and Cheney [33].
Let and be Banach spaces. Let denote the linear space of equivalence classes of formal expressions
[TABLE]
where these expressions are equivalent if for every
[TABLE]
that is, if the associated operators from are identical, where is the algebraic dual of . The resulting linear space is called the algebraic tensor product of and . We can extend to a Banach space by completing it with respect to a suitable norm. Consider the norm defined for ,
[TABLE]
and complete the algebraic tensor product with respect to this norm; the result is denoted .
In [33, Thm. 1.15, p. 11], Light and Cheney showed that for any measure space and any Banach space the linear map
[TABLE]
given by
[TABLE]
is well-defined and extends to a map
[TABLE]
which is an isometric isomorphism of the completed tensor product onto the space of Bochner-integrable functions.
The following theorem extends the function via the natural embedding of into the space of essentially bounded -valued functions defined in section 4.
Theorem 8.1
Let be a separable Banach space and let be a -finite measure space. Then there exists a continuous linear surjection
[TABLE]
Furthermore, makes the following diagram commutative:
[TABLE]
where the two horizontal arrows and are the isometric isomorphisms and ; the left-hand vertical arrow is induced by , while the right-hand vertical arrow is induced by post-composition with , i.e., for any in ,
[TABLE]
**Proof. ** The map
[TABLE]
defines a linear function ; indeed, it takes values in the Bochner integrable functions as by our Main Theorem each summand is in the class.
To see that extends to on the -completion,
[TABLE]
[TABLE]
[TABLE]
Hence, , so the map is continuous.
9 An example involving bounded variation on an interval
The following example, more elaborate than the one following Proposition 2.1, is treated in part by Barron [6] and Kurková [25].
Let be the set of equivalence classes of (essentially) bounded Lebesgue-measurable functions on , , i.e., , with norm . Let be the set of equivalence classes of all characteristic functions of closed intervals of the forms , or or with . These functions are the restrictions of characteristic functions of closed half-lines to . The equivalence relation is if and only if for almost every in (with respect to Lebesgue measure).
Let be the set of all equivalence classes of functions on with bounded variation; that is, each equivalence class contains a function such that the total variation is finite, where total variation is the largest possible total movement of a discrete point which makes a finite number of stops as x varies from to , maximized over all possible ways to choose a finite list of intermediate points, that is,
[TABLE]
In fact, each equivalence class contains exactly one function of bounded variation that satisfies the continuity conditions:
(i) is right-continuous at for , and
(ii) is left-continuous at .
Moreover, for all .
To see this, recall that every function of bounded variation is the difference of two nondecreasing functions , and are necessarily right-continuous except at a countable set. We can take , where is an arbitrary constant, and for . Now redefine both and at countable sets to form and which satisfy the continuity conditions and are still nondecreasing on . Then also satisfies the continuity conditions. It is easily shown that Since any equivalence class in can contain at most one function satisfying (i) and (ii) above, it follows that is unique and that minimizes the total variation for all functions in the equivalence class. Recall that denotes the characteristic function of the interval , etc.
Proposition 9.1
*Let and let be the subset of characteristic functions
(up to sets of Lebesgue-measure zero).
Then , and*
[TABLE]
where is the member of satisfying the continuity conditions (i) and (ii).
**Proof. ** Let be the set of equivalence classes of functions of the form
[TABLE]
where is a positive integer, , for , and
[TABLE]
All of the functions so exhibited have bounded variation and hence .
We will prove that a sequence in converges in -norm to a member of and this will establish that is a subset of and hence that is a subset of .
Let be a sequence in that is Cauchy in the -norm. Without loss of generality, we pass to the sequence , which is Cauchy in the sup-norm since satisfies the continuity conditions (i) and (ii). Thus, converges pointwise-uniformly and in the sup-norm to a function on also satisfying (i) and (ii) with finite sup-norm and whose equivalence class has finite -norm.
Let satisfy . Then
[TABLE]
for every , where par abus de notation denotes the member of satisfying (20). Letting tend to infinity and then varying and , we obtain and so .
It remains to show that everything in is actually in . Let be a nonnegative nondecreasing function on satisfying the continuity conditions (i) and (ii) above. Given a positive integer , there exists a positive integer and such that for . Indeed, for , let . (Moreover, it follows that the set of ’s include all points of left-discontinuity of such that the jump is greater than .) Let be defined as follows:
[TABLE]
[TABLE]
[TABLE]
Then belongs to , and a fortiori to as well as of , and . Moreover, . Therefore, since , is in and accordingly is in and
[TABLE]
Let be in and let , as defined above, for this purpose we take . This guarantees that both and are nonnegative. Accordingly, , and is in . Furthermore, . The last inequality follows from the fact that .
An argument similar to the above shows that is a Banach space under the norm (with or without the ). The identity map from (with this norm) to , is continuous (by Proposition 9.1) and it is also onto. Accordingly, by the Open Mapping Theorem (e.g., Yosida [43, p. 75]) the map is open, hence a homeomorphism, so the norms are equivalent. Thus, in this example, is a Banach space under these two equivalent norms.
Note however that the -norm restricted to does not give a Banach space structure; i.e., is not complete in the -norm. Indeed, with . Let be times the characteristic function of the disjoint union of closed intervals contained within the unit interval. Then but , some , since the is equivalent to the total-variation norm. While converges to zero in one norm, in the other it blows up. If were a Banach space under , it would be another Cauchy sequence, a contradiction.
10 Pointwise-integrals vs. Bochner integrals
Evaluation of Bochner integrals
A natural conjecture is that the Bochner integral, evaluated pointwise, is the pointwise integral; that is, if , where is any Banach space of functions defined on a measure space , then
[TABLE]
for all . Usually, however, one is dealing with equivalence classes of functions and thus can expect the equation (21) to hold only for almost every in . Furthermore, to specify , it is necessary to take a particular function representing
The Main Theorem implies that (21) holds for -a.e. when , for , is separable provided that , where is a weight function with finite -norm and is essentially bounded, where for each for -a.e. and is -measurable. More generally, we can show the following.
Theorem 10.1
*Let , be -finite measure spaces, let , , and let so that for each in , for -a.e. , where is a -measurable real-valued function on . Then
(i) is integrable for -a.e. ,
(ii) the equivalence class of , and
(iii) for -a.e. *
[TABLE]
**Proof. ** We first consider the case . Let be in , where . Then
[TABLE]
Here we have used Young’s inequality and Bochner’s theorem. By Fubini’s theorem, (i) follows. In addition, the map is a continuous linear functional on with . Since is for , then the function is in and has norm . The case is covered by taking , a member of , and noting that . Thus, (ii) holds for .
Also by Fubini’s theorem and Theorem 3.3, for all ,
[TABLE]
[TABLE]
Hence (iii) holds for all , including .
Now consider the case . For , the inequality (23) holds, and by [18, pp. 348–9], (i) and (ii) hold and . For ,
[TABLE]
[TABLE]
The two functions integrated against are in and agree, so the functions must be the same -a.e.
There are cases where consists of pointwise-defined functions and (21) can be taken literally.
If is a separable Banach space of pointwise-defined functions from to in which the evaluation functionals are bounded (and so in particular if is a reproducing kernel Hilbert space [4]), then (21) holds for all (not just -a.e.). Indeed, for each , the evaluation functional is bounded and linear, so by Theorem 3.3, commutes with the Bochner integral operator. As non-separable reproducing kernel Hilbert spaces exist [3, p.26], one still needs the hypothesis of separability.
In a special case involving Bochner integrals with values in Marcinkiewicz spaces, Nelson [38] showed that (21) holds. His result involves going from equivalence classes to functions, and uses a “measurable selection.” Reproducing kernel Hilbert spaces were studied by Le Page in [32] who showed that (21) holds when is a probability measure on under a Gaussian distribution assumption on variables in the dual space. Another special case of (21) is derived in Hille and Phillips [19, Theorem 3.3.4, p. 66], where the parameter space is an interval of the real line and the Banach space is a space of bounded linear transformations (i.e., the Bochner integrals are operator-valued).
Essential boundedness is needed for the Main Theorem
The following is an example of a function which is not Bochner integrable. Let with Lebesgue measure and so . Put . Then for all
[TABLE]
By l’Hospital’s rule
[TABLE]
Thus, the function is not essentially bounded on and Theorem 12 does not apply. Furthermore, for ,
[TABLE]
and
[TABLE]
Hence, by Theorem 3.1, is not Bochner integrable. Note however that
[TABLE]
for every . Thus has a pointwise integral for all , but is not in .
Connection with sup norm
In [22], we take to be the space of bounded measurable functions on , equal to the product with measure which is the (completion of the) product measure determined by the standard (unnormalized) measure on the sphere and ordinary Lebesgue measure on . We take , so is the characteristic function of the closed half-space .
We showed that if a function on decays, along with its partials of order , at a sufficient rate, then there is an integral formula expressing as an integral combination of the characteristic functions of closed half-spaces weighted by iterated Laplacians integrated over half-spaces. The characteristic functions all have sup-norm of 1 and the weight-function is in of , where and is the (completion of the) product measure determined by the standard (unnormalized) measure on the sphere of unit vectors in and ordinary Lebesgue measure on .
For example, when is odd,
[TABLE]
where
[TABLE]
with a scalar exponentially decreasing with . The integral is of the iterated directional derivative over the hyperplane with normal vector and offset ,
[TABLE]
For , the space of bounded Lebesgue-measurable functions on , which is a Banach space w.r.t. sup-norm, and the family consisting of the set of all characteristic functions for closed half-spaces in , it follows from Theorem 12 that .
Hence, from the Main Theorem,
[TABLE]
is a Bochner integral, where is given by
[TABLE]
Application of the Main Theorem requires only that be in , but [22] gives explicit formulas for (in both even and odd dimensions) provided that satisfies the decay conditions described above and in our paper; see also the other chapter in this book referenced earlier.
11 Some concluding remarks
Neural networks express a function in terms of a combination of members of a given family of functions. It is reasonable to expect that a function can be so represented if is in . The choice of thus dictates the ’s that can be represented (if we leave aside what combinations are permissible). Here we have focused on the case . The form is usually associated with a specific family such as Gaussians or Heavisides. The tensor-product interpretation suggests the possibility of using multiple families or multiple ’s to represent a larger class of ’s. Alternatively, one may replace by with a suitable extension of the measure.
The Bochner integral approach also permits to be an arbitrary Banach space (not necessarily an -space). For example, if is a space of bounded linear transformations and is a family of such transformations, we can approximate other members of this Banach space in a neural-network-like manner. Even more abstractly, we can approximate an evolving function , where is time, using weights that evolve over time and/or a family whose members evolve in a prescribed fashion. Such an approach would require some axiomatics about permissible evolutions of , perhaps similar to methods used in time-series analysis and stochastic calculus. See, e.g., [8].
Many of the restrictions we have imposed in earlier sections are not truly essential. For example, the separability constraints can be weakened. Moreover, -finiteness of need not be required since an integrable function on must vanish outside a -finite subset. More drastically, the integrable function can be replaced by a distribution or a measure. Indeed, we believe that both finite combinations and integrals can be subsumed in generalized combinations derived from Choquet’s theorem. The abstract transformations of the concept of neural network discussed here provide an “enrichment” that may have practical consequences.
12 Appendix I: Some Banach space background
The following is a brief account of the machinery of functional analysis used in this chapter. See, e.g., [43]. For , with any linear space, let
[TABLE]
denote the set of all -fold linear combinations from . If the are non-negative with sum , then the combination is called a convex combination; denotes the set of all -fold convex combinations from . Let
[TABLE]
A norm on a linear space is a function which associates to each element of a real number such that
(1) ;
(2) for all
(3) the triangle inequality holds: .
A metric is defined by the norm, and both addition and scalar multiplication become continuous functions with respect to the topology induced by the norm-metric. A metric space is complete if every sequence in the space that satisfies the Cauchy criterion is convergent. In particular, if a normed linear space is complete in the metric induced by its norm, then it is called a Banach space.
Let be a measure space; it is called -finite provided that there exists a countable family of subsets of pairwise-disjoint and measurable with finite -measure such that . The condition of -finiteness is required for Fubini’s theorem. A set is called a -null set if it is measurable with . A function from a measure space to another measure space is called measurable if the pre-image of each measurable subset is measurable. When the range space is merely a topological space, then functions are measurable if the pre-image of each open set is measurable.
Let be a measure space. If , we write for the Banach space consisting of all equivalence classes of the set of all -measurable functions from to with absolutely integrable -th powers, where and are equivalent if they agree -almost everywhere (-a.e.) - that is, if the set of points where and disagree has -measure zero, and , or for short.
13 Appendix II: Some key theorems
We include, for the reader’s convenience, the statements of some crucial theorems cited in the text.
The following consequence of the Hahn-Banach Theorem, due to Mazur, is given by Yosida [43, Theorem 3’, p. 109]. The hypotheses on are satisfied by any Banach space, but the theorem holds much more generally. See [43] for examples where is not a Banach space.
Theorem 13.1
Let be a real locally convex linear topological space, a closed convex subset, and . Then continuous linear functional
[TABLE]
Fubini’s Theorem relates iterated integrals to product integrals. Let be sets and be a -algebra of subsets of and a -algebra of subsets of . If and , then is called a measurable rectangle. We denote the smallest -algebra on which contains all the measurable rectangles by . Now let and be -finite measure spaces, and for , define
[TABLE]
where and . Also, is a -finite measure on with as the family of measurable sets. For the following, see Hewitt and Stromberg [18, p. 386].
Theorem 13.2
*Let and be -finite measure spaces. Let be a complex-valued -measurable function on , and suppose that at least one of the following three absolute integrals is finite: , , . Then the following statements hold:
(i) is in for -a.e. ;
(ii) is in for -a.e. ;
(iii) is in ;
(iv) is in ;
(v) all three of the following integrals are equal:*
[TABLE]
[TABLE]
[TABLE]
A function , any subinterval of , is called convex if
[TABLE]
The following formulation is from Hewitt and Stromberg [18, p. 202].
Theorem 13.3** (Jensen’s inequality)**
Let be a probability measure space. Let be a convex function from an interval into and let be in with such that is also in . Then is in and
[TABLE]
Acknowledgements
We thank Victor Bogdan for helpful comments on earlier versions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions . National Bureau of Standards, Washington, DC (1972)
- 2[2] Adams, R.A., Fournier, J.J.F.: Sobolev Spaces . Academic Press, Amsterdam (2003)
- 3[3] Alpay, D.: The Schur Algorithm, Reproducing Kernel Spaces, and System Theory . American Mathematical Society, Providence, RI (2001)
- 4[4] Aronszajn, N.: Theory of reproducing kernels. Trans. of AMS 68 , 337–404 (1950)
- 5[5] Barron, A.R.: Neural net approximation. In: K. Narendra (ed.) Proc. 7th Yale Workshop on Adaptive and Learning Systems , pp 69–72. Yale University Press (1992)
- 6[6] Barron, A.R.:Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. on Information Theory 39 930–945 (1993)
- 7[7] Baryshnikov Y., Ghrist, R.: Target enumeration via euler characteristic integrals. SIAM J. Appl. Math. 70 825–844 (2009)
- 8[8] Bensoussan, A.: Stochastic control by functional analysis methods . N. Holland, Amsterdam (1982)
