Generalization Bounds for Set-to-Set Matching with Negative Sampling
Masanari Kimura

TL;DR
This paper provides a theoretical analysis of the generalization error in set-to-set matching tasks using neural networks, addressing a gap in understanding the behavior of such models.
Contribution
It introduces a novel generalization bound for set-to-set matching with neural networks, incorporating negative sampling techniques.
Findings
Derived a new generalization bound for set-to-set matching models.
Analyzed the impact of negative sampling on model generalization.
Provides insights into the theoretical behavior of neural set matching.
Abstract
The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years. In particular, it has been reported that good experimental results can be obtained by preparing a neural network as a matching function, especially in complex cases where, for example, each element of the set is an image. However, theoretical analysis of set-to-set matching with such black-box functions is lacking. This paper aims to perform a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Machine Learning and Algorithms · Neural Networks and Applications
11institutetext: ZOZO Research, Tokyo, Japan
11email: [email protected]
Generalization Bounds for Set-to-Set Matching with Negative Sampling
Masanari Kimura 11 0000-0002-9953-3469
Abstract
The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years. In particular, it has been reported that good experimental results can be obtained by preparing a neural network as a matching function, especially in complex cases where, for example, each element of the set is an image. However, theoretical analysis of set-to-set matching with such black-box functions is lacking. This paper aims to perform a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task.
Keywords:
Set matching Generalization bound Neural networks
1 Introduction
The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years [3, 6, 7, 16]. The problem is formalized as a task that, given two distinct sets, finds the goodness of match between them. In particular, when the elements of the set are high-dimensional, neural networks are used as the matching function [11]. Although these strategies have been reported to work well experimentally, there is a lack of research on their theoretical behavior. A mathematical understanding of the behavior of the algorithm is an important issue since a lack of theoretical research hinders the improvement of existing algorithms for set-to-set matching.
We aim to perform a generalization error analysis of set-to-set matching algorithms in the context of statistical learning theory [15, 14]. In particular, existing deep learning-based set-to-set matching algorithms rely on negative sampling, a procedure in which negative examples are randomly generated while learning process [11]. Therefore, we clarify the theoretical behavior of the set-to-set matching algorithm with negative sampling.
2 Preliminaries
Let be -dimensional feature vectors representing the features of each individual item. Let and be sets of these feature vectors, where and are sizes of the sets. The function calculates a matching score between the two sets and . Guaranteeing the exchangeability of the set-to-set matching requires that the matching function is symmetric and invariant under any permutation of items within each set as follows.
Definition 1 (Permutation Invariance)
A set-input function f is said to be permutation invariant if
[TABLE]
for permutations on and on .
Definition 2 (Permutation Equivariance)
A map is said to be permutation equivariant if
[TABLE]
for permutations and , where and are on and , respectively. Note that is permutation invariant for permutations within .
Definition 3 (Symmetric Function)
A map is said to be symmetric if
[TABLE]
Definition 4 (Two-Set-Permutation Equivariance)
Given and , a map is said to be two-set-permutation equivariant if
[TABLE]
for any permutation operator exchanging the two sets, where indicates a sequence of arbitrary length such as or .
We consider tasks where the matching function f is used per pair of sets [18] to select a correct matching. Given candidate pairs of sets , where and , we choose as a correct one so that achieves the maximum score from amongst the candidates.
2.1 Set-to-set matching with negative sampling
In real-world set-to-set matching problems, it is often the case that only positive example set pairs can be obtained. Then, we consider training a model for set-to-set matching with negative sampling. The learner is given positive examples . Then, negative examples are generated by randomly combining set pairs from the given sets. We assume that positive and negative examples are drawn according to the underlying distribution and , respectively. Given training sample set , the goal of set-to-set matching with negative sampling is to learn a real-valued score function that ranks future positive pair higher than negative pair . Let be the loss function, which is defined as
[TABLE]
where , and is a convex function. Typical choices of include the logistic loss
[TABLE]
Definition 5 (Expected set-to-set matching loss)
Expected set-to-set matching loss is defined as
[TABLE]
Definition 6 (Empirical set-to-set matching loss)
Empirical set-to-set matching loss is defined as
[TABLE]
Here, we assume that has the Lipschitz property with respect to , i.e.,
[TABLE]
where and is a Lipschitz constant.
3 Margin bound for set-to-set matching
Our first result is based on the Rademacher complexity.
Definition 7 (Empirical Rademacher complexity)
Let be a family of matching score functions. Then, the empirical Rademacher complexity of with respect to the sample is defined as
[TABLE]
Definition 8 (Rademacher complexity)
Let denote the distribution according to which samples are drawn. For any integer , the Rademacher complexity of is the expectation of the empirical Rademacher complexity over all samples of size drawn according to :
[TABLE]
Let the marginal distribution of the first element of the pairs, and by the marginal distribution with respect to the second element of the pairs. Similarly, and . We denote by the Rademacher complexity of with respect to the marginal distribution , that is , and similarly .
Here, we assume that the loss function is the following margin loss.
Definition 9
For any , the -margin loss is the function defined for all by with,
[TABLE]
Lemma 1
Let be any input space, and be a family of functions mapping from to . Then, for any , with probability at least , each of the following holds for all :
[TABLE]
Proof
Let . Then, for two samples and , we have
[TABLE]
where and . Then, by McDiarmid’s inequality, for any , with probability at least , the following holds.
[TABLE]
We next bound the expectation of the right-hand side as follows.
[TABLE]
Here, using again McDiarmid’s inequality, with probability at least , the following holds.
[TABLE]
Finally, we use the union bound which yields with probability at least :
[TABLE]
Theorem 3.1 (Margin bound for set-to-set matching)
Let be a set of matching score functions. Fix . Then, for any , with probability at least over the choice of a sample of size , each of the following holds for all :
[TABLE]
Proof
Let be the family of functions mapping to defined by , where . Consider the family of functions derived from which are taking values in . By Lemma 1, for any with probability at least , for all ,
[TABLE]
Since for all , the generalization error is a lower bound on left-hand side, , and we can write
[TABLE]
Here, we can show that using the -Lipschitzness of . Then, can be upper bounded as follows:
[TABLE]
4 RKHS bound for set-to-set matching
In this section, we consider more precise bounds that depend on the size of the negative sample produced by negative sampling. Let be a finite sample sequence, and be the positive sample size. If the positive proportion , then sample sequence also can be denoted by .
Let be the reproducing kernel Hilbert space (RKHS) associated with the kernel , and is defined as
[TABLE]
for .
Theorem 4.1 (RKHS bound for set-to-set matching)
Suppose to be any sample sequence of size . Then, for any and ,
[TABLE]
where .
Proof
Denote and
[TABLE]
First, for each such that , let be replaced by , and we denote by as this sample. Then,
[TABLE]
Next, for each such that , let be replaced by and we denote by as this sample. Similarly, we have
[TABLE]
Finally, for each such that , let be replaced by , and we denote by as this sample. Then, we have
[TABLE]
where and . Since and , we have
[TABLE]
Combining them and applying McDiarmid’s inequality, we have the proof.
Remark 1
Given , we can find that the tight bound can be achieved when . This means that it is desirable the number of positive samples be equal to the number of negative samples (See Figure 1).
Remark 2
For any , with probability at least , we have
[TABLE]
Remark 3
For Remark 2, Let and fix . Then, we have the optimal negative sample size as .
5 Conclusion and Discussion
In this paper, we performed a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task. Our analysis reveals what the convergence rate of algorithms in set matching depend on the size of negative sample. Future studies may include the following:
- •
Derivation of tighter bounds. There are many types of mathematical tools for generalization error analysis of machine learning algorithms, and it is known that the tightness of the bounds depends on which one is used. For tighter bounds, it is useful to use mathematical tools not addressed in this paper [1, 9, 8, 10, 2].
- •
Induction of novel set matching algorithms. It is expected to derive a novel algorithm based on the discussion of generalized error analysis.
- •
The effect of data augmentation for generalization error of set-to-set matching. Many data augmentation methods have been proposed to stabilize neural network learning, and theoretical analysis when these are used would be useful [5, 4, 13, 17, 12].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bartlett, P.L., Bousquet, O., Mendelson, S.: Local rademacher complexities. The Annals of Statistics 33 (4), 1497–1537 (2005)
- 2[2] Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. pp. 429–438. IEEE (2013)
- 3[3] Iwata, T., Lloyd, J.R., Ghahramani, Z.: Unsupervised many-to-many object matching for relational data. IEEE transactions on pattern analysis and machine intelligence 38 (3), 607–617 (2015)
- 4[4] Kimura, M.: Understanding test-time augmentation. In: International Conference on Neural Information Processing. pp. 558–569. Springer (2021)
- 5[5] Kimura, M.: Why mixup improves the model performance. In: International Conference on Artificial Neural Networks. pp. 275–286. Springer (2021)
- 6[6] Kimura, M., Nakamura, T., Saito, Y.: Shift 15m: Multiobjective large-scale fashion dataset with distributional shifts. ar Xiv preprint ar Xiv:2108.12992 (2021)
- 7[7] Lisanti, G., Martinel, N., Del Bimbo, A., Luca Foresti, G.: Group re-identification via unsupervised transfer of sparse features encoding. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2449–2458 (2017)
- 8[8] Mc Allester, D.A.: Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on Computational learning theory. pp. 164–170 (1999)
