OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference

Yow-Fu Liou; Yu-Chien Tang; Yu-Hsiang Liu; An-Zi Yen

arXiv:2601.13300·cs.CL·January 21, 2026

OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference

Yow-Fu Liou, Yu-Chien Tang, Yu-Hsiang Liu, An-Zi Yen

PDF

Open Access

TL;DR

This paper introduces OI-Bench, a comprehensive benchmark to evaluate large language models' vulnerability to directive interference through option injection, revealing significant susceptibility and variability across models.

Contribution

The paper presents a novel benchmarking approach and dataset for systematically assessing LLM susceptibility to directive interference via option injection.

Findings

01

Substantial vulnerabilities in LLMs to directive interference.

02

Heterogeneous robustness observed across different models.

03

Evaluation of mitigation strategies shows varied effectiveness.

Abstract

Benchmarking large language models (LLMs) is critical for understanding their capabilities, limitations, and robustness. In addition to interface artifacts, prior studies have shown that LLM decisions can be influenced by directive signals such as social cues, framing, and instructions. In this work, we introduce option injection, a benchmarking approach that augments the multiple-choice question answering (MCQA) interface with an additional option containing a misleading directive, leveraging standardized choice structure and scalable evaluation. We construct OI-Bench, a benchmark of 3,000 questions spanning knowledge, reasoning, and commonsense tasks, with 16 directive types covering social compliance, bonus framing, threat framing, and instructional interference. This setting combines manipulation of the choice interface with directive-based interference, enabling systematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection