Risk or Chance? Large Language Models and Reproducibility in HCI Research
Thomas Kosch, Sebastian Feger

TL;DR
This paper examines how the adoption of Large Language Models in HCI research influences reproducibility, highlighting challenges, risks, and opportunities to develop best practices for valid and reproducible research outcomes.
Contribution
It provides a comprehensive analysis of reproducibility issues related to LLMs in HCI, proposing considerations for best practices and community standards.
Findings
Identifies risks like bias and prompt-hacking affecting reproducibility.
Highlights opportunities for improved documentation and education.
Discusses community pressures impacting research integrity.
Abstract
Reproducibility is a major concern across scientific fields. Human-Computer Interaction (HCI), in particular, is subject to diverse reproducibility challenges due to the wide range of research methodologies employed. In this article, we explore how the increasing adoption of Large Language Models (LLMs) across all user experience (UX) design and research activities impacts reproducibility in HCI. In particular, we review upcoming reproducibility challenges through the lenses of analogies from past to future (mis)practices like p-hacking and prompt-hacking, general bias, support in data analysis, documentation and education requirements, and possible pressure on the community. We discuss the risks and chances for each of these lenses with the expectation that a more comprehensive discussion will help shape best practices and contribute to valid and reproducible practices around using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
