RuAG: Learned-rule-augmented Generation for Large Language Models
Yudi Zhang, Pei Xiao, Lu Wang, Chaoyun Zhang, Meng Fang, Yali Du,, Yevgeniy Puzyrev, Randolph Yao, Si Qin, Qingwei Lin, Mykola Pechenizkiy,, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

TL;DR
RuAG introduces a framework that automatically extracts interpretable logic rules from data using LLMs and MCTS, enhancing large language models' reasoning across diverse tasks by injecting targeted knowledge.
Contribution
The paper presents a novel method combining LLMs, Monte Carlo Tree Search, and logic rule distillation to improve reasoning in LLMs, addressing contextual window limitations.
Findings
Effective knowledge injection improves reasoning performance.
Applicable across NLP, time-series, and industrial tasks.
Demonstrates significant enhancement over baseline models.
Abstract
In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order logic rules, which are injected into LLMs to boost their reasoning capabilities. Our method begins by formulating the search process relying on LLMs' commonsense, where LLMs automatically define head and body predicates. Then, RuAG applies Monte Carlo Tree Search (MCTS) to address the combinational searching space and efficiently discover logic rules from data. The resulting logic rules are translated into natural language, allowing targeted knowledge injection and seamless integration into LLM…
Peer Reviews
Decision·ICLR 2025 Poster
The paper contributes by integrating rule-based augmentation with generation models, leveraging rules learned from data.
There are several areas of concern: 1. Clarity in Section 3.1 (LLM-based Logic Rule Search): This section is difficult to understand. Here are some follow-up questions for clarification: 1.1. What do the initial predicates look like across the three different datasets? 1.2. How does the LLM eliminate impossible predicates? Could you provide prompt examples? 1.3. How does the LLM propose new target predicates? Any prompt examples for this? 2. Performance of Rules Alone: It appear
* The idea of using large language models for "feature extraction" is very interesting. It is related to https://arxiv.org/abs/2409.08466 (which I don't expect to be in the paper since it's very recent) * The empirical results show that for some use cases abstract rules can be leveraged to improve performance.
* The paper needs better **scoping** -- when is this method likely to be useful and when not? For example, the point of machine learning is that some things are hard to express by rules -- for example, what makes a face a face or a cat a cat? We learn machine learning models for cases where rules are hard to formulate. Is this method restricted to things that can be defined by rules or not? I think it's important to discuss this. Second - the method assumes an input of N features with some featu
1. RuAG offers a scalable solution for integrating extensive domain knowledge into LLMs that improves upon RAG or SFT. 2. Model performance is tested on a wide array of tasks and show improvement on strong baselines 3. RuAG is more computationally efficient than other methods that summarize external dataset as knowledge storage, as the calls to API models only happen once during logic rule constructions.
1. Ablation studies in Table 5 could include RAG or SFT on open-source LLMs, as the current baselines only include COT which does not include external knowledge. 2. How do LLMs suggest new rules to explore and detect impossible body predicates? These parts seem unclear to me.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need
