DefExt: A Semi Supervised Definition Extraction Tool
Luis Espinosa-Anke, Roberto Carlini, Horacio Saggion, Francesco, Ronzano

TL;DR
DefExt is a semi-supervised tool that extracts definitions from texts using a CRF-based model and bootstrapping, improving accuracy over time and available as open source.
Contribution
Introduces DefExt, a semi-supervised definition extraction tool utilizing CRF and bootstrapping, with open source release and evaluation results.
Findings
Effective in extracting definitions from diverse corpora
Improves with bootstrapping to adapt to target corpus
Open source with training and test data provided
Abstract
We present DefExt, an easy to use semi supervised Definition Extraction Tool. DefExt is designed to extract from a target corpus those textual fragments where a term is explicitly mentioned together with its core features, i.e. its definition. It works on the back of a Conditional Random Fields based sequential labeling algorithm and a bootstrapping approach. Bootstrapping enables the model to gradually become more aware of the idiosyncrasies of the target corpus. In this paper we describe the main components of the toolkit as well as experimental results stemming from both automatic and manual evaluation. We release DefExt as open source along with the necessary files to run it in any Unix machine. We also provide access to training and test data for immediate use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
