Quantile Activation: Correcting a Failure Mode of ML Models
Aditya Challa, Sravan Danda, Laurent Najman, Snehanshu Saha

TL;DR
This paper introduces quantile activation (QAct), a simple neural network activation function that improves model robustness and adaptation to distribution shifts by outputting neuron activations as relative quantiles within their context.
Contribution
The paper proposes a novel activation function, QAct, that enables neural networks to adapt to context distributions and distribution shifts without significant computational overhead.
Findings
QAct improves generalization under covariate shifts
QAct outperforms traditional models on robustness tests
QAct surpasses DINOv2 small in robustness despite smaller size
Abstract
Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we propose a simple activation function, quantile activation (QAct), that addresses this problem without significantly increasing computational costs. The core idea is to "adapt" the outputs of each neuron to its context distribution. The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution, diverging from the direct numerical outputs common in traditional networks. A specific case of the above failure mode is when…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper is well-written and easy to follow. 2. This paper considers an interesting scenario: existing ML models cannot correctly classify when several classes have the same probability.
1. The motivation and the experiments do not align. In Figure 1(a) and Figure 3(a), the authors emphasize the case where one class is the rotation version of the other. However, the datasets in the experiments are not the rotation version of each class, but the compression version[1]. 2. In your experiments, the datasets (i.e., CIFAR-10C, CIFAR-100C, TinyImageNet-C, and MNISTC) are also considered by some papers [2, 3] as the covariate shift. Does your proposed method perform well on datasets w
-Simplicity of the approach -Thorough analysis (ex. computational complexity) and explanations of the approach -Many explanations are provided as to why the idea makes sense in practice; think it is generally overlooked in machine learning, yet is key to grasping the inner workings of the approach. -I don’t have much to say in the following sections: the work is straightforward, clearly explained, and well-supported by empirical evidence.
1. There might be a problem in the possibility to generalize the approach to more complex neural networks architecture. For instance, how would quantile activation be used in architecture involving transformers? 2. This isn’t quite a weakness in itself, but the idea is quite simple (almost naive), such that I’m surprised that this idea hasn’t been explored yet. 3. I find the toy examples quite interesting to understand the logic behind the approach, yet they describe quite unique situations th
- Reveal a failure mode in machine learning - Propose a novel activation fucntion of QACT to deal with classifcation with distribution shift - The proposed QACT has a good performance on CIFAR-10/100-C and TinyImageNet-C
This paper does not have a good organization which makes the motivation unclear. The rationale of proposed quantile activation is also not well presented. In addition, only the small image datasets with distortion are not enough to demonstrate the effectiveness of QACT. Please see the details as below: - **Q1:** This manuscript starts by showing a failure example in binary classification. For me, negative examples are distribution-shifted version of positive examples, thereby making this classi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Optical measurement and interference techniques · Advanced Measurement and Metrology Techniques
