Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
Arya Labroo, Ivaxi Sheth, Vyas Raina, Amaani Ahmed, Mario Fritz

TL;DR
This paper evaluates the ability of large language models to control multiple textual concepts simultaneously, revealing significant limitations in their compositionality and fine-grained control capabilities.
Contribution
It introduces a systematic evaluation framework for multi-concept control in LLMs and highlights the challenges models face in dual-concept scenarios.
Findings
Performance drops in dual-concept settings across models
Naive prompting struggles with compositionality
Fundamental limitations in fine-grained multi-concept control
Abstract
Large Language Models (LLMs) offer strong generative capabilities, but many applications require explicit and \textit{fine-grained} control over specific textual concepts, such as humor, persuasiveness, or formality. Prior approaches in prompting and representation engineering can provide coarse or single-attribute control, but systematic evaluation of multi-attribute settings remains limited. We introduce an evaluation framework for fine-grained controllability for both single- and dual-concept scenarios, focusing on linguistically distinct concept pairs (e.g., persuasiveness vs.~humor). Surprisingly, across multiple LLMs and generative tasks, we find that performance often drops in the dual-concept setting, even though the chosen concepts should in principle be separable. This reveals a fundamental limitation of naive prompting-based control: models struggle with compositionality even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Multimodal Machine Learning Applications
