Distilling Rule-based Knowledge into Large Language Models
Wenkai Yang, Yankai Lin, Jie Zhou, Ji-Rong Wen

TL;DR
This paper introduces a rule distillation method to encode rule-based knowledge into large language models, improving learning efficiency and generalization compared to traditional example-based training.
Contribution
It proposes a novel rule distillation approach that leverages in-context learning to explicitly encode rules into LLMs, enhancing their ability to learn from limited data.
Findings
Rule distillation outperforms example-based learning in sample efficiency.
The method improves LLMs' generalization ability.
Explicit rule encoding enhances learning from limited data.
Abstract
Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current paradigm of knowledge learning for LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, this learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can learn new tasks or grasp new knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which targets on encoding rule-based knowledge into LLMs. We further propose rule distillation, which first uses the strong in-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
