ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization
Jo\~ao N. Cardoso, Arlindo L. Oliveira, Bruno Martins

TL;DR
This paper introduces ADAPT, a hybrid prompt optimization method that enhances feature visualization in large language models by combining beam search and gradient-guided mutation, overcoming domain-specific challenges.
Contribution
ADAPT is a novel hybrid approach specifically designed for LLM feature visualization, addressing the limitations of existing prompt optimization techniques in discrete text domains.
Findings
ADAPT outperforms prior methods across layers and latent types.
Feature visualization in LLMs is feasible with domain-specific design assumptions.
The proposed metrics enable rigorous comparison of feature visualization methods.
Abstract
Understanding what features are encoded by learned directions in LLM activation space requires identifying inputs that strongly activate them. Feature visualization, which optimizes inputs to maximally activate a target direction, offers an alternative to costly dataset search approaches, but remains underexplored for LLMs due to the discrete nature of text. Furthermore, existing prompt optimization techniques are poorly suited to this domain, which is highly prone to local minima. To overcome these limitations, we introduce ADAPT, a hybrid method combining beam search initialization with adaptive gradient-guided mutation, designed around these failure modes. We evaluate on Sparse Autoencoder latents from Gemma 2 2B, proposing metrics grounded in dataset activation statistics to enable rigorous comparison, and show that ADAPT consistently outperforms prior methods across layers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Generative Adversarial Networks and Image Synthesis · Topic Modeling
