TL;DR
This paper evaluates steering vectors for controlling various properties in summarization tasks, revealing their effectiveness and limitations in free-form generation.
Contribution
It extends steering vector evaluation from toy tasks to real summarization datasets, analyzing control effectiveness and quality trade-offs.
Findings
Steering vectors effectively control targeted properties in summarization.
High steering strengths cause repetition and hallucinations.
Hybrid prompting and steering offers optimal control-quality balance.
Abstract
Steering vectors are a lightweight method for controlling text properties by adding a learned bias to language model activations at inference time. While predominantly studied for multiple-choice and toy tasks, their effectiveness in free-form generation remains largely unexplored. Moving "Beyond Multiple Choice," we evaluate steering vectors for controlling topical focus, sentiment, toxicity, and readability in abstractive summaries across the SAMSum, NEWTS, and arXiv datasets. We find that steering effectively controls targeted properties, but high steering strengths consistently induce degenerate repetition and factual hallucinations. Prompting alone preserves summary quality but offers weaker control. Combining both methods yields the strongest control and the most favorable efficacy-quality trade-off at moderate steering strengths. Our work demonstrates that steering vectors face a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
