Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning
Xinyi Huang

TL;DR
SGKR is a retrieval framework that organizes domain knowledge based on code dependency graphs to improve multi-step reasoning with large language models.
Contribution
It introduces a novel structure-grounded retrieval method that leverages code dependencies for more relevant knowledge retrieval in data analysis tasks.
Findings
SGKR improves solution correctness over baseline methods.
It enhances LLM-based code generation with structured, task-relevant context.
Experiments show consistent gains on multi-step data analysis benchmarks.
Abstract
Selecting the right knowledge is critical when using large language models (LLMs) to solve domain-specific data analysis tasks. However, most retrieval-augmented approaches rely primarily on lexical or embedding similarity, which is often a weak proxy for the task-critical knowledge needed for multi-step reasoning. In many such tasks, the relevant knowledge is not merely textually related to the query, but is instead grounded in executable code and the dependency structure through which computations are carried out. To address this mismatch, we propose SGKR (Structure-Grounded Knowledge Retrieval), a retrieval framework that organizes domain knowledge with a graph induced by function-call dependencies. Given a question, SGKR extracts semantic input and output tags, identifies dependency paths connecting them, and constructs a task-relevant subgraph. The associated knowledge and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
