LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Huimin Ren; Yan Liang; Baiqiao Su; Chaobo Sun; Hengtong Lu; Kaike Zhang; Chen Wei

arXiv:2511.17561·cs.CL·March 24, 2026

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Huimin Ren, Yan Liang, Baiqiao Su, Chaobo Sun, Hengtong Lu, Kaike Zhang, Chen Wei

PDF

Open Access 1 Video

TL;DR

LexInstructEval introduces a formal, rule-based benchmark for objectively evaluating large language models' ability to follow complex lexical instructions with high granularity and reliability.

Contribution

It presents a novel, rule-based framework and dataset for fine-grained lexical instruction evaluation, addressing limitations of existing subjective and automated methods.

Findings

01

Provides a systematic, objective evaluation framework

02

Enables detailed analysis of LLMs' instruction-following capabilities

03

Facilitates research into controllability and reliability of LLMs

Abstract

The ability of Large Language Models (LLMs) to precisely follow complex and fine-grained lexical instructions is a cornerstone of their utility and controllability. However, evaluating this capability remains a significant challenge. Current methods either rely on subjective and costly human evaluation or on automated LLM-as-a-judge systems, which suffer from inherent biases and unreliability. Existing programmatic benchmarks, while objective, often lack the expressiveness to test intricate, compositional constraints at a granular level. To address these limitations, we introduce LexInstructEval, a new benchmark and evaluation framework for fine-grained lexical instruction following. Our framework is built upon a formal, rule-based grammar that deconstructs complex instructions into a canonical <Procedure, Relation, Value> triplet. This grammar enables the systematic generation of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification