ImPaKT: A Dataset for Open-Schema Knowledge Base Construction
Luke Vilnis, Zach Fisher, Bhargav Kanagal, Patrick Murray, Sumit, Sanghai

TL;DR
This paper introduces ImPaKT, a new dataset for open-schema knowledge base construction in the shopping domain, enabling improved semantic parsing and information extraction with language models.
Contribution
The creation of ImPaKT, a professionally annotated dataset with 2500 snippets for open-schema information extraction and relation discovery in the shopping domain.
Findings
Fine-tuning UL2 on ImPaKT improves relation extraction.
Human evaluation shows high accuracy of extracted implications.
Dataset facilitates knowledge base construction across domains.
Abstract
Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to construct ImPaKT, a dataset for open-schema information extraction, consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides), professionally annotated with extracted attributes, types, attribute summaries (attribute schema discovery from idiosyncratic text), many-to-one relations between compound and atomic attributes, and implication relations. We release this data in hope that it will be useful in fine tuning semantic parsers for information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsUL2 · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Balanced Selection · Sequence to Sequence
