Evaluating LLM Understanding via Structured Tabular Decision Simulations

Sichao Li; Xinyue Xu; Xiaomeng Li

arXiv:2511.10667·cs.CL·November 17, 2025

Evaluating LLM Understanding via Structured Tabular Decision Simulations

Sichao Li, Xinyue Xu, Xiaomeng Li

PDF

Open Access

TL;DR

This paper introduces Structured Tabular Decision Simulations (STaDS), a new evaluation framework for assessing whether large language models truly understand decision factors across diverse domains, beyond mere accuracy.

Contribution

The paper presents STaDS, a novel suite of decision-based evaluation settings that measure LLM understanding through decision factor reliance and comprehension across multiple domains.

Findings

01

Most models struggle with consistent accuracy across domains

02

Models can be accurate but rely on incorrect decision factors

03

Frequent mismatches between rationales and actual decision drivers

Abstract

Large language models (LLMs) often achieve impressive predictive accuracy, yet correctness alone does not imply genuine understanding. True LLM understanding, analogous to human expertise, requires making consistent, well-founded decisions across multiple instances and diverse domains, relying on relevant and domain-grounded decision factors. We introduce Structured Tabular Decision Simulations (STaDS), a suite of expert-like decision settings that evaluate LLMs as if they were professionals undertaking structured decision ``exams''. In this context, understanding is defined as the ability to identify and rely on the correct decision factors, features that determine outcomes within a domain. STaDS jointly assesses understanding through: (i) question and instruction comprehension, (ii) knowledge-based prediction, and (iii) reliance on relevant decision factors. By analyzing 9 frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods