Real-time semiparametric regression for distributed data sets
Jan Luts

TL;DR
This paper introduces a real-time, distributed semiparametric regression method capable of modeling nonlinear relationships in large-scale, evolving datasets across multiple hosts, applicable in various distributed computing environments.
Contribution
It presents a novel approach for real-time semiparametric regression suited for distributed and evolving data environments, extending existing methods to handle non-stationary models.
Findings
Effective modeling of nonlinear relationships in distributed data
Real-time analysis demonstrated on airline data
Applicable in MapReduce and multi-owner data settings
Abstract
This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant over time. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Stream Mining Techniques
