LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

Xinyuan Wang; Haoyue Bai; Nanxu Gong; Wangyang Ying; Sixun Dong; Xiquan Cui; Yanjie Fu

arXiv:2506.09085·cs.LG·June 12, 2025

LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

Xinyuan Wang, Haoyue Bai, Nanxu Gong, Wangyang Ying, Sixun Dong, Xiquan Cui, Yanjie Fu

PDF

Open Access

TL;DR

This paper introduces a teaming framework combining symbolic LLM generation with gradient-based ML optimization to improve the validity and stability of feature transformations, achieving better performance and robustness.

Contribution

It proposes a novel integrated framework that leverages LLMs' symbolic capabilities and ML's gradient search for stable, valid feature transformation generation.

Findings

01

Achieves 5% improvement in downstream performance

02

Reduces nearly half of the error cases

03

Demonstrates efficiency and robustness of the teaming policy

Abstract

Feature transformation enhances data representation by deriving new features from the original data. Generative AI offers potential for this task, but faces challenges in stable generation (consistent outputs) and valid generation (error-free sequences). Existing methods--traditional MLs' low validity and LLMs' instability--fail to resolve both. We find that LLMs ensure valid syntax, while ML's gradient-steered search stabilizes performance. To bridge this gap, we propose a teaming framework combining LLMs' symbolic generation with ML's gradient optimization. This framework includes four steps: (1) golden examples generation, aiming to prepare high-quality samples with the ground knowledge of the teacher LLM; (2) feature transformation sequence embedding and search, intending to uncover potentially superior embeddings within the latent space; (3) student LLM feature transformation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification · Topic Modeling