Bias correction for Chatterjee's graph-based correlation coefficient

Mona Azadkia; Leihao Chen; and Fang Han

arXiv:2508.09040·stat.ME·January 21, 2026

Bias correction for Chatterjee's graph-based correlation coefficient

Mona Azadkia, Leihao Chen, and Fang Han

PDF

Open Access

TL;DR

This paper analyzes the bias in Chatterjee's graph-based correlation coefficient and proposes a bias correction method, enabling root-n consistent and asymptotically normal estimation in various settings.

Contribution

It provides a detailed bias analysis and introduces a correction procedure for Chatterjee's dependence measure, improving its statistical properties.

Findings

01

Bias term can be negligible when dimension < 4

02

Bias correction achieves root-n consistency

03

Estimators are asymptotically normal

Abstract

Azadkia and Chatterjee (2021) recently introduced a simple nearest neighbor (NN) graph-based correlation coefficient that consistently detects both independence and functional dependence. Specifically, it approximates a measure of dependence that equals 0 if and only if the variables are independent, and 1 if and only if they are functionally dependent. However, this NN estimator includes a bias term that may vanish at a rate slower than root- $n$ , preventing root- $n$ consistency in general. In this article, we (i) analyze this bias term closely and show that it could become asymptotically negligible when the dimension is smaller than four; and (ii) propose a bias-correction procedure for more general settings. In both regimes, we obtain estimators (either the original or the bias-corrected version) that are root- $n$ consistent and asymptotically normal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Data Analysis with R · Advanced Statistical Modeling Techniques