GenKL: An Iterative Framework for Resolving Label Ambiguity and Label   Non-conformity in Web Images Via a New Generalized KL Divergence

Xia Huang; Kai Fong Ernest Chong

arXiv:2307.09810·cs.CV·July 20, 2023

GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence

Xia Huang, Kai Fong Ernest Chong

PDF

1 Repo

TL;DR

This paper introduces GenKL, an iterative framework utilizing a new generalized KL divergence to better identify and relabel non-conforming web image instances, significantly improving classification accuracy on multiple datasets.

Contribution

The paper proposes a novel generalized KL divergence and an iterative training framework, GenKL, to effectively identify and relabel ambiguous web image instances, surpassing existing methods.

Findings

01

Achieved state-of-the-art accuracy on Clothing1M, Food101, and WebVision datasets.

02

Demonstrated the effectiveness of the generalized KL divergence in identifying non-conforming instances.

03

Outperformed baseline methods in non-conforming instance detection.

Abstract

Web image datasets curated online inherently contain ambiguous in-distribution (ID) instances and out-of-distribution (OOD) instances, which we collectively call non-conforming (NC) instances. In many recent approaches for mitigating the negative effects of NC instances, the core implicit assumption is that the NC instances can be found via entropy maximization. For "entropy" to be well-defined, we are interpreting the output prediction vector of an instance as the parameter vector of a multinomial random variable, with respect to some trained model with a softmax output layer. Hence, entropy maximization is based on the idealized assumption that NC instances have predictions that are "almost" uniformly distributed. However, in real-world web image datasets, there are numerous NC instances whose predictions are far from being uniformly distributed. To tackle the limitation of entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codetopaper/genkl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax