Density estimation from batched broken random samples

Hancheng Bi; Bernhard Schmitzer; Thilo D. Stier

arXiv:2602.09833·math.ST·February 11, 2026

Density estimation from batched broken random samples

Hancheng Bi, Bernhard Schmitzer, Thilo D. Stier

PDF

Open Access

TL;DR

This paper introduces a parametric estimation method for density functions from broken random samples where pairing information is lost, demonstrating fast convergence rates with increasing sample batches.

Contribution

It proposes a pseudo-log-likelihood based estimator for density from broken samples and proves its fast convergence rate independent of batch size.

Findings

01

Estimator achieves fast convergence rate in number of batches

02

Method works under mild assumptions

03

Convergence rate is uniform in batch size

Abstract

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $((X_{i}, Y_{i}))_{i = 1}^{M}$ is drawn from a joint distribution with density $p (x, y)$ , but we can observe only the unordered multisets $(X_{i})_{i = 1}^{M}$ and $(Y_{i})_{i = 1}^{M}$ separately; that is, the pairing information is lost. For large $M$ , inferring $p$ from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate $p$ from $N$ i.i.d. broken sample batches, and we prove a fast convergence rate in $N$ for our estimator that is uniform in $M$ , under mild assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Machine Learning and Algorithms