Positive Distribution Shift as a Framework for Understanding Tractable Learning
Marko Medvedev, Idan Attias, Elisabetta Cornacchia, Theodor Misiakiewicz, Gal Vardi, Nathan Srebro

TL;DR
This paper introduces the concept of Positive Distribution Shift (PDS), showing how choosing a suitable training distribution can transform intractable learning problems into tractable ones, emphasizing the computational benefits of PDS.
Contribution
The paper formalizes PDS, demonstrating how it can make certain hard learning problems easier and connecting it with membership query learning, highlighting its computational advantages.
Findings
PDS can turn intractable problems into tractable ones.
Certain hard classes are easily learnable under PDS.
PDS has strong connections with membership query learning.
Abstract
We study a setting where the goal is to learn a target function f(x) with respect to a target distribution D(x), but training is done on i.i.d. samples from a different training distribution D'(x), labeled by the true target f(x). Such a distribution shift (here in the form of covariate shift) is usually viewed negatively, as hurting or making learning harder, and the traditional distribution shift literature is mostly concerned with limiting or avoiding this negative effect. In contrast, we argue that with a well-chosen D'(x), the shift can be positive and make learning easier -- a perspective called Positive Distribution Shift (PDS). Such a perspective is central to contemporary machine learning, where much of the innovation is in finding good training distributions D'(x), rather than changing the training algorithm. We further argue that the benefit is often computational rather than…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper is well-written and grounded in solid theory. The motivation behind the PDS framework is clearly presented: to illustrate the computational advantages of using an alternative feature distribution rather than the testing feature distribution. The authors use the PDS framework to demonstrate the learnability of several tasks via practical algorithms. These learnability problems are relevant and likely to interest conference attendees. The DS-PAC learnability of the parity and $k$-junta
The practicality of the PDS framework is unclear. For a practical learner, it is not feasible to choose or construct the training distribution based on the underlying labeling rule and/or the test distribution, since both are unknown to the learner. Consequently, DS learnability should be interpreted as an oracle model, where the training distribution is selected by an oracle. To address practicality without relying on an oracle, one option is to permit the learner to observe a few samples from
No strength found.
1. Clarity of presentation: It is unclear whether the section labeled “Introduction” serves as a motivation, problem formulation, or a mix of both. The narrative does not clearly define the research question or the contribution. 2. Ambiguity in notation and exposition: The description in Lines 57–74 (Page 2) is difficult to follow. The notation appears unorthodox and lacks explanation of key terms and symbols. This section is crucial for understanding the rest of the paper, yet it remains opaqu
- **Originality:** The paper introduces a fresh viewpoint on distribution shift. Instead of treating distribution shift as a challenge to overcome, it is framed as a resource to exploit for easier learning. This positive framing of distribution shift is, to the best of my knowledge, original and insightful. It connects to the intuition behind practices like pre-training or curriculum learning, but provides a formal lens to understand them. - **Theoretical contributions:** The paper is thorough i
- **Reliance on strong or unrealistic assumptions in some settings:** Some of the theoretical formulations, particularly the function-dependent PDS (f-PDS) scenario, assume knowledge that would not be available in practice. In the f-PDS framework, the training distribution $D'$ is allowed to depend on the target function $f$ itself. This is a very powerful setting. Indeed, under f-PDS the authors note one can even learn all poly-size circuits with label noise (by encoding $f$ into $D’$), but it
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Domain Adaptation and Few-Shot Learning
