Large Sample Theory for Merged Data from Multiple Sources
Takumi Saegusa

TL;DR
This paper develops large sample statistical theory for merged data from multiple sources, addressing issues like duplication, dependence, and bias, and extends empirical process theory to such complex data structures.
Contribution
It introduces a new weighted empirical process framework and extends empirical process theory to dependent, biased, and duplicated data, enabling rigorous statistical inference.
Findings
Established uniform law of large numbers and central limit theorem for complex data
Proved consistency, convergence rates, and asymptotic normality for infinite-dimensional M-estimators
Validated theoretical results through simulations and real data analysis
Abstract
We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified, and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional M-estimation and develop its consistency, rates of convergence, and asymptotic normality. Our theoretical results are illustrated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
