LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

Zijian Tang; Ying Zhang; Sibo Cai; Ruoxuan Wang

arXiv:2603.07278·cs.DB·March 10, 2026

LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

Zijian Tang, Ying Zhang, Sibo Cai, Ruoxuan Wang

PDF

Open Access

TL;DR

This paper introduces LLM-FK, a multi-agent framework that leverages large language models to accurately and efficiently detect foreign keys in large-scale, complex databases, overcoming limitations of traditional heuristic methods.

Contribution

The paper presents the first fully automated multi-agent system for foreign key detection using LLMs, addressing scalability, ambiguity, and consistency challenges in complex databases.

Findings

01

Achieves over 93% F1-score on five benchmark datasets.

02

Reduces search space by 100 to 1000 times without losing true FKs.

03

Outperforms existing methods by 15% on the MusicBrainz database.

Abstract

Detecting missing foreign keys (FKs) requires accurately modeling semantic dependencies across database schemas, which conventional heuristic-based methods are fundamentally limited in capturing. We propose LLM-FK, the first fully automated multi-agent framework for FK detection, designed to address three core challenges that hinder naive LLM-based solutions in large-scale complex databases: combinatorial search space explosion, ambiguous inference under limited context, and global inconsistency arising from isolated local predictions. LLM-FK coordinates four specialized agents: a Profiler that decomposes the FK detection problem into the task of validating FK candidate column pairs and prunes the search space via a unique-key-driven schema decomposition strategy; an Interpreter that injects self-augmented domain knowledge; a Refiner that constructs compact structural representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Web Application Security Vulnerabilities