Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents
Henrik Bradland, Morten Goodwin, Vladimir I. Zadorozhny, Per-Arne Andersen

TL;DR
Rogue One is a multi-agent LLM framework that enhances feature extraction for tabular data by integrating external knowledge, qualitative feedback, and iterative collaboration, leading to more meaningful and powerful features.
Contribution
It introduces a decentralized multi-agent system with qualitative feedback and knowledge retrieval to improve automatic feature extraction beyond existing monolithic LLM approaches.
Findings
Outperforms state-of-the-art methods on multiple datasets
Generates semantically meaningful and interpretable features
Identifies novel hypotheses like potential biomarkers
Abstract
The performance of machine learning models on tabular data is critically dependent on high-quality feature engineering. While Large Language Models (LLMs) have shown promise in automating feature extraction (AutoFE), existing methods are often limited by monolithic LLM architectures, simplistic quantitative feedback, and a failure to systematically integrate external domain knowledge. This paper introduces Rogue One, a novel, LLM-based multi-agent framework for knowledge-informed automatic feature extraction. Rogue One operationalizes a decentralized system of three specialized agents-Scientist, Extractor, and Tester-that collaborate iteratively to discover, generate, and validate predictive features. Crucially, the framework moves beyond primitive accuracy scores by introducing a rich, qualitative feedback mechanism and a "flooding-pruning" strategy, allowing it to dynamically balance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
