Automatic Prompt Optimization for Dataset-Level Feature Discovery

Adrian Cosma; Oleg Szehr; David Kletz; Alessandro Antonucci; Olivier Pelletier

arXiv:2601.13922·cs.CL·January 21, 2026

Automatic Prompt Optimization for Dataset-Level Feature Discovery

Adrian Cosma, Oleg Szehr, David Kletz, Alessandro Antonucci, Olivier Pelletier

PDF

Open Access

TL;DR

This paper introduces a novel multi-agent prompt optimization framework that automatically discovers interpretable, dataset-level features from unstructured text, improving downstream classification by optimizing feature definitions based on dataset-wide performance and interpretability.

Contribution

It formulates feature discovery as a dataset-level prompt optimization problem and proposes a multi-agent approach for automatic, interpretable feature extraction from text.

Findings

01

Effective automatic feature discovery from unstructured text.

02

Improved downstream classification performance.

03

Promotes interpretability of features.

Abstract

Feature extraction from unstructured text is a critical step in many downstream classification pipelines, yet current approaches largely rely on hand-crafted prompts or fixed feature schemas. We formulate feature discovery as a dataset-level prompt optimization problem: given a labelled text corpus, the goal is to induce a global set of interpretable and discriminative feature definitions whose realizations optimize a downstream supervised learning objective. To this end, we propose a multi-agent prompt optimization framework in which language-model agents jointly propose feature definitions, extract feature values, and evaluate feature quality using dataset-level performance and interpretability feedback. Instruction prompts are iteratively refined based on this structured feedback, enabling optimization over prompts that induce shared feature sets rather than per-example predictions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques