Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

TL;DR
Prompt Highlighter introduces an interactive method allowing users to highlight prompt spans to control multi-modal LLMs' focus during generation, enhancing output relevance without additional training.
Contribution
It presents a novel inference technique that guides multi-modal LLMs using highlighted prompt spans, improving controllability and output quality without model fine-tuning.
Findings
Effective focus control during generation demonstrated
Achieved high scores on MMBench and MME-perception
Compatible with existing LLMs and VLMs
Abstract
This study targets a critical aspect of multi-modal LLMs' (LLMs&VLMs) inference: explicit controllable text generation. Multi-modal LLMs empower multi-modality understanding with the capability of semantic generation yet bring less explainability and heavier reliance on prompt contents due to their autoregressive generative nature. While manipulating prompt formats could improve outputs, designing specific and precise prompts per task can be challenging and ineffective. To tackle this issue, we introduce a novel inference method, Prompt Highlighter, which enables users to highlight specific prompt spans to interactively control the focus during generation. Motivated by the classifier-free diffusion guidance, we form regular and unconditional context pairs based on highlighted tokens, demonstrating that the autoregressive generation in models can be guided in a classifier-free way.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Scheduling and Optimization Algorithms
MethodsFocus · Diffusion
