Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Duc-Hai Nguyen, Vijayakumar Nanjappan, Barry O'Sullivan, Hoang D. Nguyen

TL;DR
This paper introduces QASU, a benchmark for evaluating large language models' ability to understand and analyze questionnaire data, revealing that format and prompt choices significantly impact performance.
Contribution
It presents a new benchmark, QASU, systematically analyzing how serialization formats and prompts affect LLM performance on questionnaire understanding tasks.
Findings
Format and prompt choice can improve accuracy by up to 8.8%.
Structural hints via self-augmentation yield 3-4% additional gains.
Benchmark facilitates research and practical applications in LLM-based questionnaire analysis.
Abstract
Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their ability to process questionnaire data or lists of questions crossed with hundreds of respondent rows remains underexplored. Current retrieval and survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are typically designed for humans in the workflow, limiting such data integration with LLM and AI-empowered automation. This gap leaves scientists, surveyors, and everyday users without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this by introducing QASU (Questionnaire Analysis and Structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
