SQLFlow: A Bridge between SQL and Machine Learning
Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang,, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan, Tang

TL;DR
SQLFlow is a system that bridges SQL and machine learning, enabling developers to write ML workflows directly in SQL across various databases and ML engines, and deploy them efficiently on cloud platforms.
Contribution
It introduces a novel SQL extension and parsing algorithm to support diverse ML techniques and integrates with multiple database systems and ML engines for streamlined workflows.
Findings
Supports a wide range of ML techniques including deep learning and tree models.
Enables deployment as Kubernetes-native workflows for fault tolerance.
Used by major industrial companies like Alibaba and DiDi.
Abstract
Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Explainable Artificial Intelligence (XAI)
