Design based incomplete U-statistics
Xiangshun Kong, Wei Zheng

TL;DR
This paper introduces a novel incomplete U-statistic that remains asymptotically efficient even when the number of evaluated combinations grows slower than the data size, improving computational feasibility.
Contribution
The paper proposes a new type of incomplete U-statistic that achieves asymptotic efficiency with fewer evaluated combinations than previous methods.
Findings
Significant statistical efficiency improvements demonstrated.
The new method works with m growing faster than √n.
Empirical results confirm theoretical advantages.
Abstract
U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size increases. Specifically, the number of combinations, say , that a U-statistic of order has to evaluate is . Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require to grow at least faster than , albeit more slowly than , in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Machine Learning and Algorithms · Benford’s Law and Fraud Detection
