LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization
Lizhi Ma, Yi-Xiang Hu, Yihui Ren, Feng Wu, Xiang-Yang Li

TL;DR
LeJOT-AutoML leverages large language models and an agent-driven AutoML framework to automate feature engineering for job execution time prediction, significantly reducing engineering effort and improving cost efficiency in Databricks workloads.
Contribution
The paper introduces LeJOT-AutoML, a novel LLM-driven AutoML system that automates feature extraction and model training for cloud job prediction, enhancing accuracy and reducing development time.
Findings
Generated over 200 features for Databricks jobs.
Reduced feature engineering and evaluation from weeks to 20-30 minutes.
Achieved 19.01% cost savings in deployment.
Abstract
Databricks job orchestration systems (e.g., LeJOT) reduce cloud costs by selecting low-priced compute configurations while meeting latency and dependency constraints. Accurate execution-time prediction under heterogeneous instance types and non-stationary runtime conditions is therefore critical. Existing pipelines rely on static, manually engineered features that under-capture runtime effects (e.g., partition pruning, data skew, and shuffle amplification), and predictive signals are scattered across logs, metadata, and job scripts-lengthening update cycles and increasing engineering overhead. We present LeJOT-AutoML, an agent-driven AutoML framework that embeds large language model agents throughout the ML lifecycle. LeJOT-AutoML combines retrieval-augmented generation over a domain knowledge base with a Model Context Protocol toolchain (log parsers, metadata queries, and a read-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Advanced Database Systems and Queries
