Semi-supervised Active Regression
Fnu Devvrit, Nived Rajaraman, Pranjal Awasthi

TL;DR
This paper introduces a semi-supervised active learning framework for linear regression that minimizes label queries by leveraging partially labeled data, achieving near-optimal bounds based on an instance-dependent parameter called reduced rank.
Contribution
The paper formalizes semi-supervised active regression, introduces the reduced rank parameter, and provides an efficient algorithm with optimal query complexity bounds for ridge and kernel ridge regression.
Findings
Proposes an algorithm with query complexity O(R_X/ε)
Establishes matching lower bounds for active ridge regression
Improves bounds for ridge and kernel ridge regression cases
Abstract
Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of through the frame of linear regression. In this setting, the learner has access to a dataset which is composed of unlabelled examples that an algorithm can actively query, and examples labelled a-priori. Concretely, denoting the true labels by , the learner's objective is to find such that, \begin{equation} \| X \widehat{\beta} - Y \|_2^2 \le (1 + \epsilon) \min_{\beta \in \mathbb{R}^d} \| X…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Sparse and Compressive Sensing Techniques
