RPWithPrior: Label Differential Privacy in Regression

Haixia Liu; Ruifan Huang

arXiv:2601.22625·stat.ML·February 2, 2026

RPWithPrior: Label Differential Privacy in Regression

Haixia Liu, Ruifan Huang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces RPWithPrior, a novel method for regression under epsilon-label differential privacy that models responses as continuous variables, improving privacy-utility trade-offs over existing discretization-based techniques.

Contribution

The paper proposes a continuous response modeling approach with algorithms for known or unknown priors, guaranteeing epsilon-label differential privacy and outperforming existing methods.

Findings

01

RPWithPrior achieves better accuracy than Gaussian, Laplace, Staircase, and RRonBins mechanisms.

02

The approach effectively handles both known and unknown prior scenarios.

03

Numerical results on multiple datasets validate its superior performance.

Abstract

With the wide application of machine learning techniques in practice, privacy preservation has gained increasing attention. Protecting user privacy with minimal accuracy loss is a fundamental task in the data analysis and mining community. In this paper, we focus on regression tasks under $ϵ$ -label differential privacy guarantees. Some existing methods for regression with $ϵ$ -label differential privacy, such as the RR-On-Bins mechanism, discretized the output space into finite bins and then applied RR algorithm. To efficiently determine these finite bins, the authors rounded the original responses down to integer values. However, such operations does not align well with real-world scenarios. To overcome these limitations, we model both original and randomized responses as continuous random variables, avoiding discretization entirely. Our novel approach estimates an optimal…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The proposed optimization problem admits an optimal solution, and the paper presents a polynomial-time algorithm to compute it.

Weaknesses

The paper is not well written. Moreover, the privacy and utility guarantees of some of the proposed algorithms are unclear and cannot be verified from the current presentation.

Reviewer 02Rating 4Confidence 4

Strengths

1. The idea of formulating LabelDP regression in a continuous space rather than discretized bins is novel. 2. Theoretical results (Propositions 3.1, Lemmas 3.3–3.5) are clearly stated, showing that the proposed density function satisfies $\epsilon-$LabelDP. 3. The method extends naturally to the case where the prior distribution is unknown, providing a practical histogram-based alternative. 4. Experiments across multiple datasets (Communities & Crime, Criteo, California Housing) demonstrate c

Weaknesses

1. The prior $f_Y (y)$ is central to the algorithm (Section 3.2) yet poorly defined. It is unclear whether this represents: (a) an empirical label density estimated from private data, (b) a fixed external prior, or (c) a Bayesian prior distribution. In the case of the histogram-based extension (Section 4), privacy and accuracy both depend critically on the quality of prior estimation. However, the paper provides neither a sensitivity analysis with respect to estimation error nor a discussion of

Reviewer 03Rating 6Confidence 2

Strengths

1. Prior label-DP regression mechanisms typically discretize the continuous label and apply randomized response over a finite set of bins, which introduces quantization error; using a continuous non-additive randomizer directly on R removes that source of loss. 2. The paper reports consistent gains against additive baselines (Laplace/Gaussian/Staircase) and discrete non-additive baselines (RR-on-Bins and variants) at comparable epsilons, suggesting the continuous construction can be practically

Weaknesses

1. It’s not obvious why this is fundamentally better than RR-on-Bins: the method still depends on a prior summarized by a histogram and on selecting an interval. If the interval search effectively scans over endpoints induced by k histogram bins (potentially k^2 pairs) there’s a time–accuracy trade-off tied to k that isn’t fully analyzed. Clarifying how interval selection avoids a hidden discretization dependence in the beginning would help audience to understand this better 2. Prior work gives

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Machine Learning and Data Classification