Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation

Wenzhen Luo; Wei Guan; Yifan Yao; Yimin Pan; Feng Wang; Zhipeng Yu; Zhe Wen; Liang Chen; Yihong Zhuang

arXiv:2510.24762·cs.CL·October 30, 2025

Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation

Wenzhen Luo, Wei Guan, Yifan Yao, Yimin Pan, Feng Wang, Zhipeng Yu, Zhe Wen, Liang Chen, Yihong Zhuang

PDF

TL;DR

Falcon introduces a comprehensive Chinese Text-to-SQL benchmark tailored for enterprise environments, emphasizing real-world complexities like schema linking and colloquial language, to evaluate and improve large-scale models.

Contribution

The paper presents Falcon, a novel Chinese Text-to-SQL benchmark with enterprise-specific features, detailed annotations, and evaluation tools, addressing challenges in schema linking and colloquial language understanding.

Findings

01

Current models achieve at most 50% accuracy on Falcon.

02

Major errors stem from schema linking and colloquial language mapping.

03

Falcon highlights the need for improved models in enterprise-specific Chinese SQL tasks.

Abstract

We introduce Falcon, a cross-domain Chinese text-to-SQL benchmark grounded in an enterprise-compatible dialect (MaxCompute/Hive). It contains 600 Chinese questions over 28 databases; 77% require multi-table reasoning and over half touch more than four tables. Each example is annotated along SQL-computation features and Chinese semantics. For evaluation, we release a robust execution comparator and an automated evaluation pipeline, under which all current state-of-the-art large-scale models (including Deepseek) achieve accuracies of at most 50%. Major errors originate from two sources: (1) schema linking in large enterprise landscapes - hundreds of tables, denormalized fields, ambiguous column names, implicit foreign-key relations and domain-specific synonyms that make correct join/column selection difficult; and (2) mapping concise, colloquial Chinese into the exact operators and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.