Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Roy Turgeman, Tom Tirer

TL;DR
This paper investigates when low-level data processing can improve classification accuracy, challenging the data processing inequality by providing theoretical proofs and empirical evidence that such processing can be beneficial in practice.
Contribution
It offers a comprehensive theoretical analysis showing conditions under which pre-classification processing enhances accuracy, supported by empirical studies on deep classifiers and benchmark datasets.
Findings
Pre-classification processing can improve accuracy for finite training samples.
Class separation, training set size, and class balance influence the benefit of processing.
Empirical results align with theoretical predictions on denoising and encoding effects.
Abstract
The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper studies an interesting question: whether the data processing inequality reflects practical deep learning behavior and connects a foundational information-theoretic concept to empirical DNN practice. 2. The theoretical derivation is mathematically sound under its assumptions, offering a clean proof for the potential benefit of preprocessing under finite-sample conditions. 3. The paper is well-written and visually clear, with carefully structured derivations and supporting empirical
1. The theoretical model y→x→z implies a generative process, but deep models for classification follow a discriminative direction x→z→ $\hat{y} $. This mismatch weakens the claimed alignment between the theoretical framework and modern deep learning models. 2. Eq. (3) assumes a Gaussian Mixture Model where data 𝑥 is generated conditional on label y. This assumption is conceptually inverted from real-world classification, where 𝑥 precedes 𝑦. 3. The theory defines 𝐴 as a linear transformation m
- The problem is well-motivated. The result may open a new line of work on this topic.
- Though the authors provide some experimental results, the theoretical explanation is based on a simplified setup. In other words, the theory-practice gap is large that there may be a limited interpretation when extending to more realistic settings.
1. The problem studied in this paper is interesting and important. 2. The theoretical analysis is solid. 3. The authors consider various scenarios and influencing factors in their theoretical derivations.
1. The writing can be improved. For example, Section 3.2 discusses multiple scenarios; it would be helpful to outline them at the beginning of the section. In addition, providing some intuition behind the theorems would improve readability and accessibility. 2. Some theorems could potentially be extended (see the question below). 3. The data processing analyzed in this paper can restrictive (see the question below).
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
