Generalization Bounds for Set-to-Set Matching with Negative Sampling

Masanari Kimura

arXiv:2302.12991·stat.ML·February 28, 2023

Generalization Bounds for Set-to-Set Matching with Negative Sampling

Masanari Kimura

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the generalization error in set-to-set matching tasks using neural networks, addressing a gap in understanding the behavior of such models.

Contribution

It introduces a novel generalization bound for set-to-set matching with neural networks, incorporating negative sampling techniques.

Findings

01

Derived a new generalization bound for set-to-set matching models.

02

Analyzed the impact of negative sampling on model generalization.

03

Provides insights into the theoretical behavior of neural set matching.

Abstract

The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years. In particular, it has been reported that good experimental results can be obtained by preparing a neural network as a matching function, especially in complex cases where, for example, each element of the set is an image. However, theoretical analysis of set-to-set matching with such black-box functions is lacking. This paper aims to perform a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task.

Equations71

f (X, Y) = f (π_{x} X, π_{y} Y)

f (X, Y) = f (π_{x} X, π_{y} Y)

f (π_{x} X, π_{y} Y) = π_{x} f (X, Y)

f (π_{x} X, π_{y} Y) = π_{x} f (X, Y)

f (X, Y) = f (Y, X) .

f (X, Y) = f (Y, X) .

p f (Z^{(1)}, Z^{(2)}) = f (Z^{(p (1))}, Z^{(p (2))})

p f (Z^{(1)}, Z^{(2)}) = f (Z^{(p (1))}, Z^{(p (2))})

ℓ (f, Z^{+}, Z^{-}) : = φ (f (Z^{+}) - f (Z^{-})),

ℓ (f, Z^{+}, Z^{-}) : = φ (f (Z^{+}) - f (Z^{-})),

φ (f (Z^{+}) - f (Z^{-})) = lo g {1 + exp (- (f (Z^{+}) - f (Z^{-}))} .

φ (f (Z^{+}) - f (Z^{-})) = lo g {1 + exp (- (f (Z^{+}) - f (Z^{-}))} .

R (f)

R (f)

\hat{R} (f; S) : = \frac{1}{m ^{+} m ^{-}} i = 1 \sum m^{+} j = m^{+} + 1 \sum m^{+} + m^{-} ℓ (f, Z^{+}, Z^{-}) .

\hat{R} (f; S) : = \frac{1}{m ^{+} m ^{-}} i = 1 \sum m^{+} j = m^{+} + 1 \sum m^{+} + m^{-} ℓ (f, Z^{+}, Z^{-}) .

∣ φ (a) - φ (b) ∣ \leq L \cdot ∣ a - b ∣,

∣ φ (a) - φ (b) ∣ \leq L \cdot ∣ a - b ∣,

\hat{R}_{S} (F) : = E_{σ} [f \in F sup \frac{1}{m} i = 1 \sum m σ_{i} f (Z_{i})] .

\hat{R}_{S} (F) : = E_{σ} [f \in F sup \frac{1}{m} i = 1 \sum m σ_{i} f (Z_{i})] .

R_{m} (F) : = E_{S \sim p^{m}} [\hat{R} (F)] .

R_{m} (F) : = E_{S \sim p^{m}} [\hat{R} (F)] .

ϕ_{ρ} (z) = ⎩ ⎨ ⎧ 0 1 - z / ρ 1 (ρ \leq z) (0 \leq z \leq ρ) (z \leq 0) .

ϕ_{ρ} (z) = ⎩ ⎨ ⎧ 0 1 - z / ρ 1 (ρ \leq z) (0 \leq z \leq ρ) (z \leq 0) .

E [g (z)]

E [g (z)]

E [g (z)]

ψ (S^{'}) - ψ (S) \leq g \in G sup \frac{g ( z _{m} ) - g ( z _{m}^{'} )}{m} \leq \frac{1}{m} .

ψ (S^{'}) - ψ (S) \leq g \in G sup \frac{g ( z _{m} ) - g ( z _{m}^{'} )}{m} \leq \frac{1}{m} .

ψ (S) \leq E_{S} [ψ (S)] + \frac{lo g \frac{2}{δ}}{2 m} .

ψ (S) \leq E_{S} [ψ (S)] + \frac{lo g \frac{2}{δ}}{2 m} .

E_{S} [ψ (S)]

E_{S} [ψ (S)]

\leq E_{S, S^{'}} [g \in G sup \frac{1}{m} i = 1 \sum m (g (z_{i}^{'}) - g (z_{i}))]

= E_{σ, S, S^{'}} [g \in G sup \frac{1}{m} i = 1 \sum m σ_{i} (g (z_{i}^{'}) - g (z_{i}))]

\leq E_{σ, S^{'}} [g \in G sup \frac{1}{m} i = 1 \sum m σ_{i} g (z_{i}^{'})] + E_{σ, S} [g \in G sup \frac{1}{m} i = 1 \sum m - σ_{i} g (z_{i})]

= 2 E_{σ, S} [g \in G sup \frac{1}{m} i = 1 \sum m σ_{i} g (z_{i})] = 2 R_{m} (G) .

R_{m} (G) \leq \hat{R}_{S} (G) + \frac{lo g \frac{2}{δ}}{2 m} .

R_{m} (G) \leq \hat{R}_{S} (G) + \frac{lo g \frac{2}{δ}}{2 m} .

ϕ (S) \leq 2 \hat{R}_{S} (G) + 3 \frac{lo g \frac{2}{δ}}{2 m} .

ϕ (S) \leq 2 \hat{R}_{S} (G) + 3 \frac{lo g \frac{2}{δ}}{2 m} .

R (f)

R (f)

R (f)

E [ϕ_{ρ} (a [f (Z^{+}) - f (Z^{-})])] \leq \hat{R}_{ρ} (f) + 2 R_{m} (ϕ_{ρ} \circ \tilde{F}) + \frac{lo g \frac{1}{δ}}{2 m} .

E [ϕ_{ρ} (a [f (Z^{+}) - f (Z^{-})])] \leq \hat{R}_{ρ} (f) + 2 R_{m} (ϕ_{ρ} \circ \tilde{F}) + \frac{lo g \frac{1}{δ}}{2 m} .

R (f) \leq \hat{R}_{ρ} (f) + 2 R_{m} (ϕ_{ρ} \circ \tilde{F}) + \frac{lo g \frac{1}{δ}}{2 m} .

R (f) \leq \hat{R}_{ρ} (f) + 2 R_{m} (ϕ_{ρ} \circ \tilde{F}) + \frac{lo g \frac{1}{δ}}{2 m} .

R_{m} (\tilde{F})

R_{m} (\tilde{F})

= \frac{1}{m} E_{S, σ} [f \in F sup i = 1 \sum m σ_{i} (f (Z_{i}^{'}) - f (Z_{i}))]

\leq \frac{1}{m} E_{S, σ} [f \in F sup i = 1 \sum m σ_{i} f (Z_{i}^{'}) + f \in F sup i = 1 \sum m σ_{i} f (Z_{i})]

= E_{S} [R_{S^{2}} (F) + R_{S^{1}} (F)] = R_{m}^{p_{2}} (F) + R_{m}^{p_{1}} (F) .

F_{r} = {f \in R_{K} ∣ ∥ f ∥_{K} \leq r}

F_{r} = {f \in R_{K} ∣ ∥ f ∥_{K} \leq r}

P_{S_{α}} [∣ \hat{R} (f; S_{α}) - R (f) ∣ \geq ϵ] \leq 2 exp {\frac{α ^{2} ( 1 - α ) ^{2} m ϵ ^{2}}{2 L ^{2} κ ^{2} r ^{2}}},

P_{S_{α}} [∣ \hat{R} (f; S_{α}) - R (f) ∣ \geq ϵ] \leq 2 exp {\frac{α ^{2} ( 1 - α ) ^{2} m ϵ ^{2}}{2 L ^{2} κ ^{2} r ^{2}}},

z_{i} : = z (Z_{i}) : = = {+ 1 - 1 (Z_{i} \in S^{+}), (Z_{i} \in S^{-}) .

z_{i} : = z (Z_{i}) : = = {+ 1 - 1 (Z_{i} \in S^{+}), (Z_{i} \in S^{-}) .

∣ \hat{R} (f; S) - \hat{R} (f; S^{k}) ∣

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Machine Learning and Algorithms · Neural Networks and Applications

Full text

11institutetext: ZOZO Research, Tokyo, Japan

11email: [email protected]

Generalization Bounds for Set-to-Set Matching with Negative Sampling

Masanari Kimura 11 0000-0002-9953-3469

Abstract

The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years. In particular, it has been reported that good experimental results can be obtained by preparing a neural network as a matching function, especially in complex cases where, for example, each element of the set is an image. However, theoretical analysis of set-to-set matching with such black-box functions is lacking. This paper aims to perform a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task.

Keywords:

Set matching Generalization bound Neural networks

1 Introduction

The problem of matching two sets of multiple elements, namely set-to-set matching, has received a great deal of attention in recent years [3, 6, 7, 16]. The problem is formalized as a task that, given two distinct sets, finds the goodness of match between them. In particular, when the elements of the set are high-dimensional, neural networks are used as the matching function [11]. Although these strategies have been reported to work well experimentally, there is a lack of research on their theoretical behavior. A mathematical understanding of the behavior of the algorithm is an important issue since a lack of theoretical research hinders the improvement of existing algorithms for set-to-set matching.

We aim to perform a generalization error analysis of set-to-set matching algorithms in the context of statistical learning theory [15, 14]. In particular, existing deep learning-based set-to-set matching algorithms rely on negative sampling, a procedure in which negative examples are randomly generated while learning process [11]. Therefore, we clarify the theoretical behavior of the set-to-set matching algorithm with negative sampling.

2 Preliminaries

Let $\bm{x}_{n},\bm{y}_{m}\in\mathfrak{X}=\mathbb{R}^{d}$ be $d$ -dimensional feature vectors representing the features of each individual item. Let $\mathcal{X}=\{\bm{x}_{1},\dots,\bm{x}_{N}\}$ and $\mathcal{Y}=\{\bm{y}_{1},\dots,\bm{y}_{M}\}$ be sets of these feature vectors, where $\mathcal{X},\mathcal{Y}\in 2^{\mathfrak{X}}$ and $N,M\in\mathbb{N}$ are sizes of the sets. The function $f:2^{\mathfrak{X}}\times 2^{\mathfrak{X}}\to\mathbb{R}$ calculates a matching score between the two sets $\mathcal{X}$ and $\mathcal{Y}$ . Guaranteeing the exchangeability of the set-to-set matching requires that the matching function $f(\mathcal{X},\mathcal{Y})$ is symmetric and invariant under any permutation of items within each set as follows.

Definition 1 (Permutation Invariance)

A set-input function f is said to be permutation invariant if

[TABLE]

for permutations $\pi_{x}$ on $\{1,\dots,N\}$ and $\pi_{y}$ on $\{1,\dots,M\}$ .

Definition 2 (Permutation Equivariance)

A map $f:\mathfrak{X}^{N}\times\mathfrak{X}^{M}\to\mathfrak{X}^{N}$ is said to be permutation equivariant if

[TABLE]

for permutations $\pi_{x}$ and $\pi_{y}$ , where $\pi_{x}$ and $\pi_{y}$ are on $\{1,\dots,N\}$ and $\{1,\dots,M\}$ , respectively. Note that $f$ is permutation invariant for permutations within $\mathcal{Y}$ .

Definition 3 (Symmetric Function)

A map $f:2^{\mathfrak{X}}\times 2^{\mathfrak{X}}\to\mathbb{R}$ is said to be symmetric if

[TABLE]

Definition 4 (Two-Set-Permutation Equivariance)

Given $\mathcal{X}^{(1)}\in\mathfrak{X}^{N}$ and $\mathcal{Z}^{(2)}\in\mathfrak{X}^{M}$ , a map $f:\mathfrak{X}^{\ast}\times\mathfrak{X}^{\ast}\to\mathfrak{X}^{\ast}\times\mathfrak{X}^{\ast}$ is said to be two-set-permutation equivariant if

[TABLE]

for any permutation operator $p$ exchanging the two sets, where $\mathfrak{X}^{\ast}=\cup^{\infty}_{n=0}\mathfrak{X}^{n}$ indicates a sequence of arbitrary length such as $\mathfrak{{X}^{N}}$ or $\mathfrak{X}^{M}$ .

We consider tasks where the matching function f is used per pair of sets [18] to select a correct matching. Given candidate pairs of sets $(\mathcal{X},\mathcal{Y}^{(k)})$ , where $\mathcal{X},\mathcal{Y}^{(k)}\in 2^{\mathfrak{X}}$ and $k\in\{1,\dots,K\}$ , we choose $\mathcal{Y}^{(k^{\ast})}$ as a correct one so that $f(\mathcal{X},\mathcal{Y}^{(k^{\ast})})$ achieves the maximum score from amongst the $K$ candidates.

2.1 Set-to-set matching with negative sampling

In real-world set-to-set matching problems, it is often the case that only positive example set pairs can be obtained. Then, we consider training a model for set-to-set matching with negative sampling. The learner is given positive examples $S^{+}=\{(\mathcal{X},\mathcal{Y})\}^{m^{+}}_{i=1}$ . Then, negative examples $S^{-}=\{(\mathcal{X},\mathcal{Y})\}^{m^{-}}_{i=1}$ are generated by randomly combining set pairs from the given sets. We assume that positive and negative examples are drawn according to the underlying distribution $p^{+}$ and $p^{-}$ , respectively. Given training sample set $S=(S^{+},S^{-})$ , the goal of set-to-set matching with negative sampling is to learn a real-valued score function $f:2^{\mathfrak{X}}\times 2^{\mathfrak{X}}\to\mathbb{R}$ that ranks future positive pair $(\mathcal{X},\mathcal{Y})^{+}$ higher than negative pair $(\mathcal{X},\mathcal{Y})^{-}$ . Let $\ell$ be the loss function, which is defined as

[TABLE]

where $Z^{+}=(\mathcal{X},\mathcal{Y})^{+}$ , $Z^{-}=(\mathcal{X},\mathcal{Y})^{-}$ and $\varphi:\mathbb{R}\to\mathbb{R}^{+}$ is a convex function. Typical choices of $\varphi$ include the logistic loss

[TABLE]

Definition 5 (Expected set-to-set matching loss)

Expected set-to-set matching loss $R(f)$ is defined as

[TABLE]

Definition 6 (Empirical set-to-set matching loss)

Empirical set-to-set matching loss $\hat{R}(f;S)$ is defined as

[TABLE]

Here, we assume that $\varphi$ has the Lipschitz property with respect to $\mathbb{R}$ , i.e.,

[TABLE]

where $a,b\in\mathbb{R}$ and $L>0$ is a Lipschitz constant.

3 Margin bound for set-to-set matching

Our first result is based on the Rademacher complexity.

Definition 7 (Empirical Rademacher complexity)

Let $\mathcal{F}$ be a family of matching score functions. Then, the empirical Rademacher complexity of $\mathcal{F}$ with respect to the sample $S$ is defined as

[TABLE]

Definition 8 (Rademacher complexity)

Let $p$ denote the distribution according to which samples are drawn. For any integer $m\geq 1$ , the Rademacher complexity of $\mathcal{F}$ is the expectation of the empirical Rademacher complexity over all samples of size $m$ drawn according to $p$ :

[TABLE]

Let $p_{1}$ the marginal distribution of the first element of the pairs, and by $p_{2}$ the marginal distribution with respect to the second element of the pairs. Similarly, $S^{1}\sim p_{1}$ and $S^{2}\sim p_{2}$ . We denote by $\mathcal{R}^{1}_{m}$ the Rademacher complexity of $\mathcal{F}$ with respect to the marginal distribution $p_{1}$ , that is $\mathcal{R}^{1}_{m}(\mathcal{F})=\mathbb{E}[\hat{\mathcal{R}}_{S^{1}}(\mathcal{F})]$ , and similarly $\mathcal{R}^{2}_{m}(\mathcal{F})=\mathbb{E}[\hat{\mathcal{R}}_{S^{2}}(\mathcal{F})]$ .

Here, we assume that the loss function is the following margin loss.

Definition 9

For any $\rho>0$ , the $\rho$ -margin loss is the function $\ell_{\rho}$ defined for all $z,z^{\prime}\in\mathbb{R}$ by $\ell_{\rho}(z,z^{\prime})=\phi(zz^{\prime})$ with,

[TABLE]

Lemma 1

Let $Z\in\mathbb{R}$ be any input space, and $\mathcal{G}$ be a family of functions mapping from $Z$ to $[0,1]$ . Then, for any $\delta>0$ , with probability at least $1-\delta$ , each of the following holds for all $g\in\mathcal{G}$ :

[TABLE]

Proof

Let $\psi(S)=\sup_{g\in\mathcal{G}}\mathbb{E}[g]-\frac{1}{m}\sum^{m}_{i=1}g(z_{i})$ . Then, for two samples $S$ and $S^{\prime}$ , we have

[TABLE]

where $z_{m}\in S$ and $z^{\prime}_{m}\in S^{\prime}$ . Then, by McDiarmid’s inequality, for any $\delta>0$ , with probability at least $1-\delta/2$ , the following holds.

[TABLE]

We next bound the expectation of the right-hand side as follows.

[TABLE]

Here, using again McDiarmid’s inequality, with probability at least $1-\delta/2$ , the following holds.

[TABLE]

Finally, we use the union bound which yields with probability at least $1-\delta$ :

[TABLE]

Theorem 3.1 (Margin bound for set-to-set matching)

Let $\mathcal{F}$ be a set of matching score functions. Fix $\rho>0$ . Then, for any $\delta>0$ , with probability at least $1-\delta$ over the choice of a sample $S$ of size $m$ , each of the following holds for all $f\in\mathcal{F}$ :

[TABLE]

Proof

Let $\tilde{\mathcal{F}}$ be the family of functions mapping $(\mathfrak{X}\times\mathfrak{X})\times\{-1,+1\}$ to $\mathbb{R}$ defined by $\tilde{\mathcal{F}}=\{z=(Z^{\prime},Z),a)\mapsto a[f(Z^{\prime})-f(Z)]\ |\ f\in\mathcal{F}\}$ , where $a\in\{0,1\}$ . Consider the family of functions $\tilde{\mathcal{F}}=\{\phi_{\rho}\circ g\ |\ f\in\tilde{\mathcal{F}}\}$ derived from $\tilde{\mathcal{F}}$ which are taking values in $[0,1]$ . By Lemma 1, for any $\delta>0$ with probability at least $1-\delta$ , for all $f\in\mathcal{F}$ ,

[TABLE]

Since $1_{u\leq 0}\leq\phi_{\rho}(u)$ for all $u\in\mathbb{R}$ , the generalization error $R(f)$ is a lower bound on left-hand side, $R(f)=\mathbb{E}[1_{a[f(Z^{\prime})-f(Z)]\leq 0}]\leq\mathbb{E}[\phi_{\rho}(a[f(Z^{\prime})-f(Z)])]$ , and we can write

[TABLE]

Here, we can show that $\mathcal{R}_{m}(\phi_{\rho}\circ\tilde{\mathcal{F}})\leq\frac{1}{\rho}\mathcal{R}_{m}(\tilde{\mathcal{F}})$ using the $(1/\rho)$ -Lipschitzness of $\phi_{\rho}$ . Then, $\mathcal{R}_{m}(\tilde{\mathcal{F}})$ can be upper bounded as follows:

[TABLE]

4 RKHS bound for set-to-set matching

In this section, we consider more precise bounds that depend on the size of the negative sample produced by negative sampling. Let $S=((\mathcal{X}_{1},\mathcal{Y}_{1}),\dots,(\mathcal{X}_{m},\mathcal{Y}_{m}))\in(\mathfrak{X}\times\mathfrak{X})^{m}$ be a finite sample sequence, and $m^{+}$ be the positive sample size. If the positive proportion $\frac{m^{+}}{m}=\alpha$ , then sample sequence $S$ also can be denoted by $S_{\alpha}$ .

Let $\mathfrak{R}_{K}$ be the reproducing kernel Hilbert space (RKHS) associated with the kernel $K$ , and $\mathcal{F}_{r}$ is defined as

[TABLE]

for $r>0$ .

Theorem 4.1 (RKHS bound for set-to-set matching)

Suppose $S_{\alpha}$ to be any sample sequence of size $m$ . Then, for any $\epsilon>0$ and $f\in\mathcal{F}_{r}$ ,

[TABLE]

where $\kappa\coloneqq\sup_{x}\sqrt{K(x,x)}$ .

Proof

Denote $S=(S^{+},S^{-})=\{Z_{1},\dots,Z_{m}\}$ and

[TABLE]

First, for each $1\leq k\leq m^{+}$ such that $z_{i}=+1$ , let $(Z_{k},+1)$ be replaced by $(Z^{\prime}_{k},+1)\in(\mathfrak{X}\times\mathfrak{X})\times\{-1,+1\}$ , and we denote by $S^{k}$ as this sample. Then,

[TABLE]

Next, for each $m^{+}+1\leq k\leq m$ such that $z_{i}=-1$ , let $(Z_{l},-1)$ be replaced by $(Z^{\prime}_{k},-1)\in(\mathfrak{X}\times\mathfrak{X})\times\{-1,+1\}$ and we denote by $\bar{S}^{k}$ as this sample. Similarly, we have

[TABLE]

Finally, for each $1\leq k\leq m^{+}$ such that $z_{i}=+1$ , let $(Z_{k},+1)$ be replaced by $(Z^{\prime}_{k},-1)\in(\mathfrak{X}\times\mathfrak{X})\times\{-1,+1\}$ , and we denote by $\tilde{S}^{k}=\bar{S}^{k}\cup\{(Z_{m+1},-1)\}$ as this sample. Then, we have

[TABLE]

where $\Gamma_{1}=|\hat{R}(f;S)-\hat{R}(f;S\cup\{Z_{m+1},-1\})|$ and $\Gamma_{2}=|\hat{R}(f;S\cup\{Z_{m+1},-1\})-\hat{R}(f;\tilde{S}^{k})|$ . Since $\Gamma_{1}\leq\frac{2L}{m^{-}+1}\|f\|_{\infty}$ and $\Gamma_{2}\leq\frac{2L}{m^{+}}\|f\|_{\infty}$ , we have

[TABLE]

Combining them and applying McDiarmid’s inequality, we have the proof.

Remark 1

Given $m,\epsilon,L$ , we can find that the tight bound can be achieved when $\alpha=\frac{1}{2}$ . This means that it is desirable the number of positive samples be equal to the number of negative samples (See Figure 1).

Remark 2

For any $\delta>0$ , with probability at least $1-\delta$ , we have

[TABLE]

Remark 3

For Remark 2, Let $m=m^{+}+m^{-}$ and fix $m^{+}\in\mathbb{N}$ . Then, we have the optimal negative sample size as $(1-\alpha)=2/3$ .

5 Conclusion and Discussion

In this paper, we performed a generalization error analysis in set-to-set matching to reveal the behavior of the model in that task. Our analysis reveals what the convergence rate of algorithms in set matching depend on the size of negative sample. Future studies may include the following:

•

Derivation of tighter bounds. There are many types of mathematical tools for generalization error analysis of machine learning algorithms, and it is known that the tightness of the bounds depends on which one is used. For tighter bounds, it is useful to use mathematical tools not addressed in this paper [1, 9, 8, 10, 2].

•

Induction of novel set matching algorithms. It is expected to derive a novel algorithm based on the discussion of generalized error analysis.

•

The effect of data augmentation for generalization error of set-to-set matching. Many data augmentation methods have been proposed to stabilize neural network learning, and theoretical analysis when these are used would be useful [5, 4, 13, 17, 12].

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bartlett, P.L., Bousquet, O., Mendelson, S.: Local rademacher complexities. The Annals of Statistics 33 (4), 1497–1537 (2005)
2[2] Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. pp. 429–438. IEEE (2013)
3[3] Iwata, T., Lloyd, J.R., Ghahramani, Z.: Unsupervised many-to-many object matching for relational data. IEEE transactions on pattern analysis and machine intelligence 38 (3), 607–617 (2015)
4[4] Kimura, M.: Understanding test-time augmentation. In: International Conference on Neural Information Processing. pp. 558–569. Springer (2021)
5[5] Kimura, M.: Why mixup improves the model performance. In: International Conference on Artificial Neural Networks. pp. 275–286. Springer (2021)
6[6] Kimura, M., Nakamura, T., Saito, Y.: Shift 15m: Multiobjective large-scale fashion dataset with distributional shifts. ar Xiv preprint ar Xiv:2108.12992 (2021)
7[7] Lisanti, G., Martinel, N., Del Bimbo, A., Luca Foresti, G.: Group re-identification via unsupervised transfer of sparse features encoding. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2449–2458 (2017)
8[8] Mc Allester, D.A.: Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on Computational learning theory. pp. 164–170 (1999)