Extracting Linguistic Resources from the Web for Concept-to-Text Generation
Gerasimos Lampouras, Ion Androutsopoulos

TL;DR
This paper presents methods to semi-automatically extract domain-specific linguistic resources from the Web for concept-to-text generation, reducing manual effort while maintaining high-quality output.
Contribution
It introduces novel extraction techniques for sentence plans and natural language names tailored for OWL ontologies, enhancing NaturalOWL's capabilities.
Findings
Generated texts are nearly as good as manually created resources.
Semi-automatic extraction outperforms ontology identifier-based resources.
Minimal human involvement suffices for high-quality linguistic resource extraction.
Abstract
Many concept-to-text generation systems require domain-specific linguistic resources to produce high quality texts, but manually constructing these resources can be tedious and costly. Focusing on NaturalOWL, a publicly available state of the art natural language generator for OWL ontologies, we propose methods to extract from the Web sentence plans and natural language names, two of the most important types of domain-specific linguistic resources used by the generator. Experiments show that texts generated using linguistic resources extracted by our methods in a semi-automatic manner, with minimal human involvement, are perceived as being almost as good as texts generated using manually authored linguistic resources, and much better than texts produced by using linguistic resources extracted from the relation and entity identifiers of the ontology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling
