A Differentiable Rank-Based Objective For Better Feature Learning
Krunoslav Lehman Pavasovic, David Lopez-Paz, Giulio Biroli, Levent, Sagun

TL;DR
This paper introduces difFOCI, a differentiable approximation of a non-parametric dependence measure, enabling improved feature selection, neural network regularization, and fairness in classification tasks.
Contribution
We develop difFOCI, a differentiable, parametric version of FOCI, allowing broader application in feature learning, neural network training, and fairness without sensitive data.
Findings
difFOCI outperforms FOCI in variable selection tasks
It enhances feature learning and reduces spurious correlations
It can be integrated into neural networks for improved performance
Abstract
In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in \cite{azadkia2021simple}. While FOCI is based on a non-parametric coefficient of conditional dependence, we introduce its parametric, differentiable approximation. With this approximate coefficient of correlation, we present a new algorithm called difFOCI, which is applicable to a wider range of machine learning problems thanks to its differentiable nature and learnable parameters. We present difFOCI in three contexts: (1) as a variable selection method with baseline comparisons to FOCI, (2) as a trainable model parametrized with a neural network, and (3) as a generic, widely applicable neural network regularizer, one that improves feature…
Peer Reviews
Decision·ICLR 2025 Poster
- The motivation is clear to me. FOCI is a tool for selecting important features from data based on their statistical relationships. However, it is not differentiable, which hinders its use in deep neural networks. To address this limitation, this submission proposes a differentiable, parametric approximation. - This submission provides a clear definition of difFOCI. The toy examples offer some intuition into how difFOCI works. - The three applications are well-chosen. They effectively demonstra
- While some real-world datasets are used in the experiments to demonstrate the effectiveness of difFOCI, it is unclear if it can be extended to large-scale datasets. Specifically, the datasets in Section 5.1 are small-scale, and the neural networks or learning algorithms used are relatively simple. The Waterbird task, for example, is simpler compared to multi-class tasks. Please discuss the scalability and generalization potential of difFOCI. - The fairness study is interesting; however, the da
1. The motivation of the paper is well stated. 2. The paper is well-structured and well-written. 3. Providing results on both toy experiments and real world datasets makes the paper more solid.
1. The real world datasets seem to be out-dated, where the latest one was released in 2019. It would be more convincing to presents results on more recent are more complex datasets, such as those in WILDS benchmark. 2. This paper only considers one model architecture, i.e., ResNet-50. With the increasing usage of Transformer-based models, it is also important to show the effectiveness on more complex models. 3. Simply showing the improved performance on worst group accuracy does not sufficient
* S1. The proposed method is sound and has good potential for feature selection and feature debiasing * S2. The method has mainly good results in synthetic and some real datasets.
* W1. The experiments on real datasets are not that strong. * W1.1 For the spurious correlation experiments, Waterbirds dataset is a small and simple dataset and a successful method should be tested on other datasets besides it. How many seeds were used? Was the same protocol used to select hyperparameters for the baselines and the proposed method? The benchmark used in [A] can be used to evaluate the proposed method more rigorously. The results of the method will be more reliable if multiple
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Data Classification
