Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of   CountSketch to Adaptive Inputs

Edith Cohen; Jelani Nelson; Tam\'as Sarl\'os; Uri Stemmer

arXiv:2207.00956·cs.DS·August 30, 2022

Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

Edith Cohen, Jelani Nelson, Tam\'as Sarl\'os, Uri Stemmer

PDF

Open Access 1 Video

TL;DR

This paper demonstrates a fundamental vulnerability in CountSketch and Feature Hashing, showing that an adversary can craft inputs after quadratic queries to bias the sketch, revealing limits of robustness in adaptive scenarios.

Contribution

The authors prove a tight lower bound by constructing an attack that exploits CountSketch's vulnerability to adaptive inputs, establishing inherent robustness limitations.

Findings

01

Adversarial inputs can bias CountSketch after O(ell^2) queries.

02

Classic estimators fail under adaptive adversarial inputs.

03

The attack applies universally to any correct estimator, known or unknown.

Abstract

CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $ℓ_{2}$ -heavy hitters (keys $i$ where $v_{i}^{2} > ϵ ∥ v ∥_{2}^{2}$ ) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O (ℓ / ϵ)$ are accurate for a number of queries that is exponential in $ℓ$ . When inputs are adaptive, however, an adversarial input can be constructed after $O (ℓ)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O} (ℓ^{2})$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O (ℓ^{2})$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning