SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Nithin Somasekharan; Youssef Hassan; Shiyao Lin; Gihan Panapitiya; Patrick Emami; Anurag Acharya; Sameera Horawalavithana; Shaowu Pan

arXiv:2605.18630·cs.AI·May 19, 2026

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Nithin Somasekharan, Youssef Hassan, Shiyao Lin, Gihan Panapitiya, Patrick Emami, Anurag Acharya, Sameera Horawalavithana, Shaowu Pan

PDF

1 Repo

TL;DR

SCICONVBENCH is a benchmark designed to evaluate large language models' ability to clarify and refine ill-posed scientific questions through multi-turn dialogue in computational science.

Contribution

It introduces a structured benchmark with a rubric-based evaluation for assessing LLMs' clarification and correction capabilities in scientific task formulation.

Findings

01

Frontier models resolve only 52.7% of disambiguation cases in fluid mechanics.

02

Models perform relatively well on inconsistency resolution.

03

LLMs often make implicit assumptions not grounded in conversation.

Abstract

Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific problem is already well-posed, whereas practical scientific assistance often begins with an ill-posed user request that must be refined through dialogue before any computation, analysis, or experiment can be carried out reliably. We introduce SCICONVBENCH, a benchmark for multi- turn clarification in scientific task formulation across four computational science problem domains: fluid mechanics, solid mechanics, materials science, and par- tial differential equations (PDEs). SCICONVBENCH targets two complementary capabilities: eliciting missing information (disambiguation) and detecting and correcting erroneous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csml-rpi/SciConvBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.