Generalizing similarity in noisy setups: the DIBS phenomenon

Nayara Fonseca; Veronica Guidetti

arXiv:2201.12803·cs.LG·July 25, 2023

Generalizing similarity in noisy setups: the DIBS phenomenon

Nayara Fonseca, Veronica Guidetti

PDF

TL;DR

This paper investigates how data density and noise types affect the generalization of Siamese Neural Networks, revealing a new phenomenon called DIBS where label noise severely impairs similarity learning in dense datasets.

Contribution

It introduces the DIBS phenomenon, showing how label noise and data density interact to impact the generalization of similarity learning models, especially in overparametrized regimes.

Findings

01

Double descent behavior occurs regardless of training setup.

02

Density of data pairs critically influences generalization.

03

PLN causes more severe similarity violations than SLN in dense datasets.

Abstract

This work uncovers an interplay among data density, noise, and the generalization ability in similarity learning. We consider Siamese Neural Networks (SNNs), which are the basic form of contrastive learning, and explore two types of noise that can impact SNNs, Pair Label Noise (PLN) and Single Label Noise (SLN). Our investigation reveals that SNNs exhibit double descent behaviour regardless of the training setup and that it is further exacerbated by noise. We demonstrate that the density of data pairs is crucial for generalization. When SNNs are trained on sparse datasets with the same amount of PLN or SLN, they exhibit comparable generalization properties. However, when using dense datasets, PLN cases generalize worse than SLN ones in the overparametrized region, leading to a phenomenon we call Density-Induced Break of Similarity (DIBS). In this regime, PLN similarity violation becomes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning