Linear Regression with Unknown Truncation Beyond Gaussian Features
Alexandros Kouridakis, Anay Mehrotra, Alkis Kalavasis, Constantine Caramanis

TL;DR
This paper introduces a polynomial-time algorithm for truncated linear regression with unknown survival sets, requiring only sub-Gaussian features and advancing the practical applicability of such models.
Contribution
It presents the first efficient algorithm for unknown survival sets in truncated linear regression, relaxing distributional assumptions to sub-Gaussian features.
Findings
Algorithm runs in polynomial time in d and 1/ε
Learns unions of intervals using positive examples only
Advances positive-only PAC learning methods
Abstract
In truncated linear regression, samples are shown only when the outcome falls inside a certain survival set and the goal is to estimate the unknown -dimensional regressor . This problem has a long history of study in Statistics and Machine Learning going back to the works of (Galton, 1897; Tobin, 1958) and more recently in, e.g., (Daskalakis et al., 2019; 2021; Lee et al., 2023; 2024). Despite this long history, however, most prior works are limited to the special case where is precisely known. The more practically relevant case, where is unknown and must be learned from data, remains open: indeed, here the only available algorithms require strong assumptions on the distribution of the feature vectors (e.g., Gaussianity) and, even then, have a run time for achieving accuracy. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
