Data subsampling for Poisson regression with pth-root-link
Han Cheng Lie, Alexander Munteanu

TL;DR
This paper introduces data subsampling techniques for Poisson regression, focusing on coresets and their theoretical bounds, with specific analysis for pth-root-link functions, improving efficiency in large datasets.
Contribution
The paper develops a novel framework for sublinear coreset construction in Poisson regression, including new bounds and complexity parameters for pth-root-link functions.
Findings
Sublinear coresets exist when the complexity parameter is small.
Dependence on data size can be reduced to polylogarithmic levels.
Square root-link admits an $O( ext{log}(y_{max}))$ dependence, ID-link requires $ heta( ext{sqrt}(y_{max}/ ext{log}(y_{max})))$.
Abstract
We develop and analyze data subsampling techniques for Poisson regression, the standard model for count data . In particular, we consider the Poisson generalized linear model with ID- and square root-link functions. We consider the method of coresets, which are small weighted subsets that approximate the loss function of Poisson regression up to a factor of . We show lower bounds against coresets for Poisson regression that continue to hold against arbitrary data reduction techniques up to logarithmic factors. By introducing a novel complexity parameter and a domain shifting approach, we show that sublinear coresets with approximation guarantee exist when the complexity parameter is small. In particular, the dependence on the number of input points can be reduced to polylogarithmic. We show that the dependence on other input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Methods and Mixture Models
MethodsCoresets
