Concept Extraction Using Pointer-Generator Networks
Alexander Shvets, Leo Wanner

TL;DR
This paper introduces a novel pointer-generator network model for concept extraction that outperforms existing methods and enhances the performance of DBpedia Spotlight, demonstrating strong results across multiple datasets.
Contribution
The paper presents a new open-domain extractive model based on pointer-generator networks trained on Wikipedia data, improving concept extraction accuracy over standard techniques.
Findings
Model significantly outperforms traditional methods.
Combining with DBpedia Spotlight yields further improvements.
Achieves state-of-the-art results on multiple datasets.
Abstract
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk-concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network leveraging bidirectional LSTMs and a copy mechanism. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
