Hypothesis Generation with Large Language Models

Yangqiaoyu Zhou; Haokun Liu; Tejes Srivastava; Hongyuan Mei; and; Chenhao Tan

arXiv:2404.04326·cs.AI·December 20, 2024·3 cites

Hypothesis Generation with Large Language Models

Yangqiaoyu Zhou, Haokun Liu, Tejes Srivastava, Hongyuan Mei, and, Chenhao Tan

PDF

Open Access 3 Repos 2 Videos

TL;DR

This paper explores how large language models can generate and refine hypotheses from data, leading to improved predictive accuracy and new scientific insights, surpassing traditional methods.

Contribution

It introduces an iterative hypothesis generation method using LLMs with a reward-based update, enhancing predictive performance and uncovering novel insights.

Findings

01

Improved accuracy by 31.7% on synthetic data

02

Enhanced real-world dataset performance by up to 24.9%

03

Generated hypotheses corroborate and extend human theories

Abstract

Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled examples). To enable LLMs to handle arbitrarily long contexts, we generate initial hypotheses from a small number of examples and then update them iteratively to improve the quality of hypotheses. Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks, improving accuracy by 31.7% on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Hypothesis Generation with Large Language Models· underline

Taxonomy

TopicsTopic Modeling

MethodsFocus