Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, Rene, Just

TL;DR
This paper investigates the process of translating research hypotheses into statistical models, highlighting challenges, current practices, and implications for developing better analytical tools.
Contribution
It provides a comprehensive analysis of hypothesis formalization, identifying key steps, challenges, and limitations of current tools, and proposes a dual-search process framework.
Findings
Researchers decompose hypotheses into sub-hypotheses and select proxy variables.
Analysts tend to fixate on familiar, possibly sub-optimal, analysis approaches.
Existing software tools offer inconsistent abstractions that limit hypothesis formalization.
Abstract
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Data Analysis with R · Statistics Education and Methodologies
