OptiSQL: Executable SQL Generation from Optical Tokens
Sifan Li, Hongkai Chen, Yujun Cai, Liyang Chen, Qingwen Ye, Yiwei Wang

TL;DR
OptiSQL is a vision-based framework that efficiently generates executable SQL directly from table images and natural language questions by using compact optical tokens, reducing token overhead and maintaining accuracy.
Contribution
This work introduces OptiSQL, a novel visual approach that compresses table information into optical tokens for SQL generation, addressing limitations of text-based schemas.
Findings
Retains strong execution accuracy with fewer tokens
Reduces table input tokens by an order of magnitude
Maintains structural information under visual perturbations
Abstract
Executable SQL generation is typically studied in text-to-SQL settings, where tables are provided as fully linearized textual schemas and contents. While effective, this formulation assumes access to structured text and incurs substantial token overhead, which is misaligned with many real-world scenarios where tables appear as visual artifacts in documents or webpages. We investigate whether compact optical representations can serve as an efficient interface for executable semantic parsing. We present OptiSQL, a vision-driven framework that generates executable SQL directly from table images and natural language questions using compact optical tokens. OptiSQL leverages an OCR-oriented visual encoder to compress table structure and content into a small set of optical tokens and fine-tunes a pretrained decoder for SQL generation while freezing the encoder to isolate representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Software Engineering Research · Logic, programming, and type systems
