SciDef: Automating Definition Extraction from Academic Literature with Large Language Models
Filip Ku\v{c}era, Christoph Mandl, Isao Echizen, Radu Timofte, Timo Spinde

TL;DR
SciDef is an automated pipeline using large language models to extract definitions from scientific literature, demonstrating high accuracy but highlighting challenges in relevance filtering.
Contribution
The paper introduces SciDef, a novel LLM-based system for extracting definitions from academic texts, with new datasets and optimized prompting strategies.
Findings
Achieved 86.4% extraction accuracy on test set
Multi-step and DSPy prompting improve performance
NLI-based metrics provide reliable evaluation
Abstract
Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Topic Modeling
