On Coresets for Logistic Regression
Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler and, David P. Woodruff

TL;DR
This paper investigates the limits of coreset construction for logistic regression, introduces a complexity measure to identify tractable cases, and proposes a novel sampling method that produces sublinear coresets with theoretical guarantees, validated on real data.
Contribution
The paper introduces a complexity measure for logistic regression data sets and a new sensitivity sampling scheme that yields the first provably sublinear coresets under certain conditions.
Findings
No strongly sublinear coresets exist in the worst case.
The proposed method produces sublinear coresets for data with bounded complexity.
Experimental results show improved performance over uniform sampling and existing methods.
Abstract
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure , which quantifies the hardness of compressing a data set for logistic regression. has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded -complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear -coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Algorithms · Machine Learning and Data Classification
MethodsCoresets
