Large Language Models for Scientific Information Extraction: An Empirical Study for Virology
Mahsa Shamsabadi, Jennifer D'Souza, S\"oren Auer

TL;DR
This study explores using large language models to automatically generate structured summaries of scientific contributions in virology, demonstrating their effectiveness in complex information extraction tasks with fewer parameters.
Contribution
The paper introduces a novel automated method leveraging LLMs for structured scientific content summarization, highlighting their emergent abilities in complex information extraction within scientific domains.
Findings
Finetuned FLAN-T5 achieves competitive results with significantly fewer parameters.
Structured content representation aids in navigating dense scientific literature.
LLMs can effectively replace traditional modular information extraction pipelines.
Abstract
In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs' emergent abilities. For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Wikis in Education and Collaboration
MethodsFocus · Flan-T5
