Generalizing similarity in noisy setups: the DIBS phenomenon
Nayara Fonseca, Veronica Guidetti

TL;DR
This paper investigates how data density and noise types affect the generalization of Siamese Neural Networks, revealing a new phenomenon called DIBS where label noise severely impairs similarity learning in dense datasets.
Contribution
It introduces the DIBS phenomenon, showing how label noise and data density interact to impact the generalization of similarity learning models, especially in overparametrized regimes.
Findings
Double descent behavior occurs regardless of training setup.
Density of data pairs critically influences generalization.
PLN causes more severe similarity violations than SLN in dense datasets.
Abstract
This work uncovers an interplay among data density, noise, and the generalization ability in similarity learning. We consider Siamese Neural Networks (SNNs), which are the basic form of contrastive learning, and explore two types of noise that can impact SNNs, Pair Label Noise (PLN) and Single Label Noise (SLN). Our investigation reveals that SNNs exhibit double descent behaviour regardless of the training setup and that it is further exacerbated by noise. We demonstrate that the density of data pairs is crucial for generalization. When SNNs are trained on sparse datasets with the same amount of PLN or SLN, they exhibit comparable generalization properties. However, when using dense datasets, PLN cases generalize worse than SLN ones in the overparametrized region, leading to a phenomenon we call Density-Induced Break of Similarity (DIBS). In this regime, PLN similarity violation becomes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
