Conversational Lexicography: Querying Lexicographic Data on Knowledge Graphs with SPARQL through Natural Language
Kilian Sennrich, Sina Ahmadi

TL;DR
This paper explores creating natural language interfaces for lexicographic data on knowledge graphs like Wikidata, using a large dataset of natural language to SPARQL mappings and evaluating different language models.
Contribution
It introduces a multidimensional taxonomy of Wikidata's lexicographic ontology and a dataset with over 1.2 million mappings, assessing model capabilities for NL to SPARQL translation.
Findings
GPT-3.5-Turbo shows better generalization than GPT-2 and Phi-1.5
Model size and pre-training diversity are crucial for adaptability
Challenges remain in achieving robust generalization and scalability
Abstract
Knowledge graphs offer an excellent solution for representing the lexical-semantic structures of lexicographic data. However, working with the SPARQL query language represents a considerable hurdle for many non-expert users who could benefit from the advantages of this technology. This paper addresses the challenge of creating natural language interfaces for lexicographic data retrieval on knowledge graphs such as Wikidata. We develop a multidimensional taxonomy capturing the complexity of Wikidata's lexicographic data ontology module through four dimensions and create a template-based dataset with over 1.2 million mappings from natural language utterances to SPARQL queries. Our experiments with GPT-2 (124M), Phi-1.5 (1.3B), and GPT-3.5-Turbo reveal significant differences in model capabilities. While all models perform well on familiar patterns, only GPT-3.5-Turbo demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · linguistics and terminology studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Attention Dropout · Softmax · Weight Decay · Multi-Head Attention
