Quantile Activation: Correcting a Failure Mode of ML Models

Aditya Challa; Sravan Danda; Laurent Najman; Snehanshu Saha

arXiv:2405.11573·cs.LG·April 4, 2025

Quantile Activation: Correcting a Failure Mode of ML Models

Aditya Challa, Sravan Danda, Laurent Najman, Snehanshu Saha

PDF

Open Access 3 Reviews

TL;DR

This paper introduces quantile activation (QAct), a simple neural network activation function that improves model robustness and adaptation to distribution shifts by outputting neuron activations as relative quantiles within their context.

Contribution

The paper proposes a novel activation function, QAct, that enables neural networks to adapt to context distributions and distribution shifts without significant computational overhead.

Findings

01

QAct improves generalization under covariate shifts

02

QAct outperforms traditional models on robustness tests

03

QAct surpasses DINOv2 small in robustness despite smaller size

Abstract

Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we propose a simple activation function, quantile activation (QAct), that addresses this problem without significantly increasing computational costs. The core idea is to "adapt" the outputs of each neuron to its context distribution. The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution, diverging from the direct numerical outputs common in traditional networks. A specific case of the above failure mode is when…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1. This paper is well-written and easy to follow. 2. This paper considers an interesting scenario: existing ML models cannot correctly classify when several classes have the same probability.

Weaknesses

1. The motivation and the experiments do not align. In Figure 1(a) and Figure 3(a), the authors emphasize the case where one class is the rotation version of the other. However, the datasets in the experiments are not the rotation version of each class, but the compression version[1]. 2. In your experiments, the datasets (i.e., CIFAR-10C, CIFAR-100C, TinyImageNet-C, and MNISTC) are also considered by some papers [2, 3] as the covariate shift. Does your proposed method perform well on datasets w

Reviewer 02Rating 8Confidence 4

Strengths

-Simplicity of the approach -Thorough analysis (ex. computational complexity) and explanations of the approach -Many explanations are provided as to why the idea makes sense in practice; think it is generally overlooked in machine learning, yet is key to grasping the inner workings of the approach. -I don’t have much to say in the following sections: the work is straightforward, clearly explained, and well-supported by empirical evidence.

Weaknesses

1. There might be a problem in the possibility to generalize the approach to more complex neural networks architecture. For instance, how would quantile activation be used in architecture involving transformers? 2. This isn’t quite a weakness in itself, but the idea is quite simple (almost naive), such that I’m surprised that this idea hasn’t been explored yet. 3. I find the toy examples quite interesting to understand the logic behind the approach, yet they describe quite unique situations th

Reviewer 03Rating 3Confidence 5

Strengths

- Reveal a failure mode in machine learning - Propose a novel activation fucntion of QACT to deal with classifcation with distribution shift - The proposed QACT has a good performance on CIFAR-10/100-C and TinyImageNet-C

Weaknesses

This paper does not have a good organization which makes the motivation unclear. The rationale of proposed quantile activation is also not well presented. In addition, only the small image datasets with distortion are not enough to demonstrate the effectiveness of QACT. Please see the details as below: - **Q1:** This manuscript starts by showing a failure example in binary classification. For me, negative examples are distribution-shifted version of positive examples, thereby making this classi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Optical measurement and interference techniques · Advanced Measurement and Metrology Techniques