Mathematical Foundations of Poisoning Attacks on Linear Regression over Cumulative Distribution Functions

Atsuki Sato; Martin Aum\"uller; Yusuke Matsui

arXiv:2603.00537·cs.LG·March 3, 2026

Mathematical Foundations of Poisoning Attacks on Linear Regression over Cumulative Distribution Functions

Atsuki Sato, Martin Aum\"uller, Yusuke Matsui

PDF

Open Access

TL;DR

This paper provides a rigorous theoretical analysis of poisoning attacks on linear regression models over CDFs used in learned indexes, identifying optimal attack strategies and bounds on attack impact.

Contribution

It characterizes optimal poisoning attacks on linear regression over CDFs and proposes methods to evaluate attack impact, advancing understanding of attack strategies.

Findings

01

Optimal single-point poisoning attack characterized

02

Greedy multi-point attack is not always optimal

03

Upper bounds on attack impact are empirically close to greedy approach

Abstract

Learned indexes are a class of index data structures that enable fast search by approximating the cumulative distribution function (CDF) using machine learning models (Kraska et al., SIGMOD'18). However, recent studies have shown that learned indexes are vulnerable to poisoning attacks, where injecting a small number of poison keys into the training data can significantly degrade model accuracy and reduce index performance (Kornaropoulos et al., SIGMOD'22). In this work, we provide a rigorous theoretical analysis of poisoning attacks targeting linear regression models over CDFs, one of the most basic regression models and a core component in many learned indexes. Our main contributions are as follows: (i) We present a theoretical proof characterizing the optimal single-point poisoning attack and show that the existing method yields the optimal attack. (ii) We show that in multi-point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Data Quality and Management · Privacy-Preserving Technologies in Data