A Massive Data Framework for M-Estimators with Cubic-Rate

Chengchun Shi; Wenbin Lu; Rui Song

arXiv:1605.07249·math.ST·April 6, 2017

A Massive Data Framework for M-Estimators with Cubic-Rate

Chengchun Shi, Wenbin Lu, Rui Song

PDF

TL;DR

This paper develops a theoretical framework for divide and conquer methods applied to cubic-rate M-estimators in massive data settings, showing improved convergence and normality of aggregated estimators.

Contribution

It introduces a general theory for the asymptotic distribution of aggregated cubic-rate M-estimators, applicable to various estimators and demonstrating faster convergence.

Findings

01

Aggregated estimators achieve faster convergence rates.

02

Asymptotic normality of the aggregated estimators is established.

03

Simulation results validate the theoretical improvements.

Abstract

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a simple average. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations also validate our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.