Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
{\DJ}or{\dj}e Klisura, Anthony Rios

TL;DR
This paper reveals security vulnerabilities in text-to-SQL systems by introducing a zero-knowledge attack that reconstructs database schemas without prior knowledge, highlighting significant privacy risks and proposing initial defenses.
Contribution
The paper presents a novel zero-knowledge schema inference attack on text-to-SQL models, demonstrating high accuracy in schema reconstruction and exposing security vulnerabilities.
Findings
High accuracy in schema reconstruction with F1 scores up to .99
Schema leakage poses serious security risks for text-to-SQL systems
Proposed protection mechanisms have limited effectiveness against the attack
Abstract
Text-to-SQL systems empower users to interact with databases using natural language, automatically translating queries into executable SQL code. However, their reliance on database schema information for SQL generation exposes them to significant security vulnerabilities, particularly schema inference attacks that can lead to unauthorized data access or manipulation. In this paper, we introduce a novel zero-knowledge framework for reconstructing the underlying database schema of text-to-SQL models without any prior knowledge of the database. Our approach systematically probes text-to-SQL models with specially crafted questions and leverages a surrogate GPT-4 model to interpret the outputs, effectively uncovering hidden schema elements -- including tables, columns, and data types. We demonstrate that our method achieves high accuracy in reconstructing table names, with F1 scores of up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Quality and Management · Distributed systems and fault tolerance · Cryptography and Data Security
MethodsAttention Is All You Need · Adam · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
