Are Abstracts Enough for Hypothesis Generation?

Justin Sybrandt; Angelo Carrabba; Alexander Herzog; Ilya Safro

arXiv:1804.05942·cs.IR·October 23, 2018

Are Abstracts Enough for Hypothesis Generation?

Justin Sybrandt, Angelo Carrabba, Alexander Herzog, Ilya Safro

PDF

TL;DR

This study evaluates whether abstracts alone suffice for hypothesis generation or if full-text papers improve results, analyzing the trade-offs between data source length, quality, and computational cost in knowledge network-based systems.

Contribution

It systematically compares the impact of abstract versus full-text corpora on hypothesis generation quality and interpretability, highlighting the importance of document length over quantity.

Findings

01

Longer documents slightly improve result quality

02

Full-text papers introduce more intruder terms, reducing interpretability

03

Document length impacts results more than document count

Abstract

The potential for automatic hypothesis generation (HG) systems to improve research productivity keeps pace with the growing set of publicly available scientific information. But as data becomes easier to acquire, we must understand the effect different textual data sources have on our resulting hypotheses. Are abstracts enough for HG, or does it need full-text papers? How many papers does an HG system need to make valuable predictions? How sensitive is a general-purpose HG system to hyperparameter values or input quality? What effect does corpus size and document length have on HG results? To answer these questions we train multiple versions of knowledge network-based HG system, Moliere, on varying corpora in order to compare challenges and trade offs in terms of result quality and computational requirements. Moliere generalizes main principles of similar knowledge network-based HG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.