Do Children Texts Hold The Key To Commonsense Knowledge?
Julien Romero, Simon Razniewski

TL;DR
This paper investigates whether children's texts are a valuable resource for extracting commonsense knowledge, finding they contain more explicit assertions and improve language model performance when used for fine-tuning.
Contribution
The study demonstrates that children's texts are richer in explicit commonsense assertions and can enhance language models' knowledge extraction capabilities.
Findings
Children's texts contain more and more typical commonsense assertions.
Fine-tuning on children's texts improves language model performance in commonsense tasks.
Using children's texts offers a promising alternative to larger models and corpora.
Abstract
Compiling comprehensive repositories of commonsense knowledge is a long-standing problem in AI. Many concerns revolve around the issue of reporting bias, i.e., that frequency in text sources is not a good proxy for relevance or truth. This paper explores whether children's texts hold the key to commonsense knowledge compilation, based on the hypothesis that such content makes fewer assumptions on the reader's knowledge, and therefore spells out commonsense more explicitly. An analysis with several corpora shows that children's texts indeed contain much more, and more typical commonsense assertions. Moreover, experiments show that this advantage can be leveraged in popular language-model-based commonsense knowledge extraction settings, where task-unspecific fine-tuning on small amounts of children texts (childBERT) already yields significant improvements. This provides a refreshing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
