GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
Jiale Fu, Yaqing Wang, Simeng Han, Jiaming Fan, Xu Yang

TL;DR
GraphIC introduces a graph-based retrieval method that improves in-context example selection for multi-step reasoning tasks, leading to better large language model performance by focusing on reasoning structures rather than superficial semantics.
Contribution
It proposes a novel reasoning-aware graph-based retrieval model that explicitly encodes reasoning steps and dependencies for more effective in-context example selection.
Findings
Outperforms 10 baseline methods across multiple reasoning tasks
Effectively filters superficial semantics to focus on reasoning processes
Enhances LLM performance in multi-step reasoning scenarios
Abstract
In-context learning (ICL) enhances large language models (LLMs) by incorporating demonstration examples, yet its effectiveness heavily depends on the quality of selected examples. Current methods typically use text embeddings to measure semantic similarity, which often introduces bias in multi-step reasoning tasks. This occurs because text embeddings contain irrelevant semantic information and lack deeper reasoning structures. To address this, we propose GraphIC, a graph-based retrieval model that leverages reasoning-aware representation and specialized similarity metric for in-context example retrieval. GraphIC first constructs thought graphs-directed, node-attributed graphs that explicitly model reasoning steps and their dependencies-for candidate examples and queries. This approach filters out superficial semantics while preserving essential reasoning processes. Next, GraphIC…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper uses a formalized reasoning representation to construct a thought graph for complex reasoning problems. Based on that, it can better model the underlying reasoning process than the semantic representation of natural language. 2. It enhances the graph embedding by the personalized PageRank and establishes a probabilistic model for the thought graph. GraphIC retrieves in-context examples by selecting top-k candidate examples that can maximize the probability of generating the corre
1. GraphIC relies on the thought graph, which is generated by formalized reasoning representation from LLMs. How to ensure the correctness of the thought graph of candidate examples? Will it be multiple possible thought graphs for the same query? Will these factors affect the robustness of GraphIC? 2. For a test query q, GraphIC first creates the thought graph G^q without the ground-truth answer and retrieve in-context examples to maximize the probability density p_i(X^q). This also assumes th
The proposed method seems to produce marginally better results than the baselines in most cases.
It is very hard to understand what's going on in the proposed method -- for example, the method uses Bayesian networks but the paper never explicitly states which join distribution the Bayesian network aims to represent.
* The motive of the paper is reasonable and the method proposed is novel. * Writing of this paper is good, with reasonable structure. * The experiments are relatively abundant, and the experimental results can prove the conclusion of the paper.
* Some parts of the method section of the paper lack some details, there are many assumptions but no conditions, refer to questions. * Method relies on LLM to construct a thought graph, which may be difficult or inaccurate to decompose key steps for complex problems. * The lack of experiments on the thought graph, in my opinion, is an important part of the method and has a big impact on method performance, refer to questions.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Graph Neural Networks · AI-based Problem Solving and Planning
