I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing, Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin, Choi

TL;DR
This paper introduces I2D2, a novel framework for improving commonsense knowledge generation in smaller language models through NeuroLogic decoding and self-imitation, challenging the notion that scale alone drives commonsense capabilities.
Contribution
The paper presents a new distillation framework that enhances smaller models' commonsense knowledge without relying on large-scale models, using NeuroLogic decoding and self-imitation learning.
Findings
Smaller models can outperform larger ones with novel distillation algorithms.
I2D2 creates the largest high-quality commonsense generics corpus to date.
Scale is not the only factor in achieving commonsense knowledge in language models.
Abstract
Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
MethodsKnowledge Distillation
