DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh, Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das,, Mirella Lapata, Chris Alberti

TL;DR
This paper introduces DOLOMITES, a comprehensive benchmark for evaluating language models on complex, domain-specific long-form methodical tasks that require structured reasoning and expert knowledge across various fields.
Contribution
It develops a typology of methodical tasks and creates DoLoMiTes, a large benchmark with real expert data and examples for assessing model performance on structured, domain-specific long-form generation.
Findings
Language models struggle with complex inferences in methodical tasks.
Expert revisions reveal the difficulty of automating structured long-form generation.
Benchmark highlights the need for improved models in domain-specific reasoning.
Abstract
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInnovative Teaching and Learning Methods · Model-Driven Software Engineering Techniques
