GenericsKB: A Knowledge Base of Generic Statements
Sumithra Bhakthavatsalam, Chloe Anastasiades, Peter Clark

TL;DR
GenericsKB is a large, high-quality knowledge base of naturally occurring generic statements, enhancing NLP tasks and reasoning by providing semantically rich, annotated generic sentences collected from multiple sources.
Contribution
This work introduces GenericsKB, the first large-scale resource of naturally occurring generic sentences with annotations, improving reasoning and NLP applications over previous triple-based datasets.
Findings
Using GenericsKB improves multihop reasoning performance.
GenericsKB provides higher-quality generics than larger, less specific corpora.
The resource benefits NLP applications and linguistic research.
Abstract
We present a new resource for the NLP community, namely a large (3.5M+ sentence) knowledge base of *generic statements*, e.g., "Trees remove carbon dioxide from the atmosphere", collected from multiple corpora. This is the first large resource to contain *naturally occurring* generic sentences, as opposed to extracted or crowdsourced triples, and thus is rich in high-quality, general, semantically complete statements. All GenericsKB sentences are annotated with their topical term, surrounding context (sentences), and a (learned) confidence. We also release GenericsKB-Best (1M+ sentences), containing the best-quality generics in GenericsKB augmented with selected, synthesized generics from WordNet and ConceptNet. In tests on two existing datasets requiring multihop reasoning (OBQA and QASC), we find using GenericsKB can result in higher scores and better explanations than using a much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
