Information-based Optimal Subdata Selection for Clusterwise Linear   Regression

Yanxi Liu; John Stufken; and Min Yang

arXiv:2309.00720·stat.ME·September 6, 2023

Information-based Optimal Subdata Selection for Clusterwise Linear Regression

Yanxi Liu, John Stufken, and Min Yang

PDF

Open Access

TL;DR

This paper introduces an information-based subdata selection method for clusterwise linear regression models, overcoming computational challenges and proving asymptotic optimality for large datasets.

Contribution

It develops a novel framework for selecting subdata in clusterwise linear regression, addressing the lack of closed-form Fisher information and establishing asymptotic optimality.

Findings

01

Proposed method is asymptotically optimal for large datasets.

02

Framework overcomes the absence of closed-form Fisher information.

03

Enhances computational feasibility for large-scale mixture models.

Abstract

Mixture-of-Experts models are commonly used when there exist distinct clusters with different relationships between the independent and dependent variables. Fitting such models for large datasets, however, is computationally virtually impossible. An attractive alternative is to use a subdata selected by ``maximizing" the Fisher information matrix. A major challenge is that no closed-form expression for the Fisher information matrix is available for such models. Focusing on clusterwise linear regression models, a subclass of MoE models, we develop a framework that overcomes this challenge. We prove that the proposed subdata selection approach is asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Survey Sampling and Estimation Techniques · Statistical Methods and Bayesian Inference