On the Distributed Estimation for Scalar-on-Function Regression Models
Peilun He, Han Lin Shang, Nan Zou

TL;DR
This paper introduces distributed estimation methods for scalar-on-function regression models, effectively reducing computational costs and enabling data sharing across institutions while maintaining high accuracy.
Contribution
It develops novel distributed estimation procedures for three functional regression models, addressing computational challenges and data sharing limitations in functional data analysis.
Findings
Distributed estimators significantly reduce computation time.
High estimation and prediction accuracy are maintained.
Overfitting occurs with very small block sizes in FPLM.
Abstract
This paper proposes distributed estimation procedures for three scalar-on-function regression models: the functional linear model (FLM), the functional non-parametric model (FNPM), and the functional partial linear model (FPLM). The framework addresses two key challenges in functional data analysis, namely the high computational cost of large samples and limitations on sharing raw data across institutions. Monte Carlo simulations show that the distributed estimators substantially reduce computation time while preserving high estimation and prediction accuracy for all three models. When block sizes become too small, the FPLM exhibits overfitting, leading to narrower prediction intervals and reduced empirical coverage probability. An example of an empirical study using the \textit{tecator} dataset further supports these findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
