Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
Garvin Kruthof

TL;DR
This paper introduces DriftBench, a benchmark for evaluating how well large language models adhere to constraints during multi-turn scientific ideation, revealing persistent violations despite recall of constraints.
Contribution
The paper presents DriftBench, a comprehensive benchmark and analysis revealing that LLMs often violate constraints under iterative pressure, with partial mitigation from structured checkpointing.
Findings
Iterative pressure increases structural complexity and reduces constraint adherence.
Models accurately recall constraints but often violate them behaviorally.
Structured checkpointing partially reduces constraint violations but does not eliminate them.
Abstract
When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural complexity and often reduces adherence to original constraints. A restatement probe reveals a dissociation between declarative recall and behavioral adherence, as models accurately restate constraints they simultaneously violate. The knows-but-violates (KBV) rate, measuring constraint non-compliance despite preserved recall, ranges from 8% to 99% across models. Structured checkpointing partially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
