Automatic Prompt Optimization for Dataset-Level Feature Discovery
Adrian Cosma, Oleg Szehr, David Kletz, Alessandro Antonucci, Olivier Pelletier

TL;DR
This paper introduces a novel multi-agent prompt optimization framework that automatically discovers interpretable, dataset-level features from unstructured text, improving downstream classification by optimizing feature definitions based on dataset-wide performance and interpretability.
Contribution
It formulates feature discovery as a dataset-level prompt optimization problem and proposes a multi-agent approach for automatic, interpretable feature extraction from text.
Findings
Effective automatic feature discovery from unstructured text.
Improved downstream classification performance.
Promotes interpretability of features.
Abstract
Feature extraction from unstructured text is a critical step in many downstream classification pipelines, yet current approaches largely rely on hand-crafted prompts or fixed feature schemas. We formulate feature discovery as a dataset-level prompt optimization problem: given a labelled text corpus, the goal is to induce a global set of interpretable and discriminative feature definitions whose realizations optimize a downstream supervised learning objective. To this end, we propose a multi-agent prompt optimization framework in which language-model agents jointly propose feature definitions, extract feature values, and evaluate feature quality using dataset-level performance and interpretability feedback. Instruction prompts are iteratively refined based on this structured feedback, enabling optimization over prompts that induce shared feature sets rather than per-example predictions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques
