Assessing the Distribution Consistency of Sequential Data
Mahendra Mariadassou (MAP5), Avner Bar-Hen (MAP5)

TL;DR
This paper introduces a non-parametric test for assessing whether a new batch of data maintains the same distribution as previous observations, using Edgeworth expansion to improve accuracy and address discrete data cases.
Contribution
It proposes a novel distribution consistency test based on Edgeworth expansion, applicable to both continuous and discrete data, with analysis of convergence rates.
Findings
The test effectively detects distribution changes in sequential data.
Edgeworth expansion improves the approximation accuracy.
Convergence rates vary between continuous and discrete cases.
Abstract
Given n observations, we study the consistency of a batch of k new observations, in terms of their distribution function. We propose a non-parametric, non-likelihood test based on Edgeworth expansion of the distribution function. The keypoint is to approximate the distribution of the n+k observations by the distribution of n-k among the n observations. Edgeworth expansion gives the correcting term and the rate of convergence. We also study the discrete distribution case, for which Cram\`er's condition of smoothness is not satisfied. The rate of convergence for the various cases are compared.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Advanced Statistical Methods and Models · Statistical Methods and Inference
