OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang,, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley

TL;DR
This paper introduces OCEAN, an offline evaluation framework for large language models' chain-of-thought reasoning, utilizing knowledge graphs and reinforcement learning to improve reasoning alignment without retraining the models.
Contribution
The paper proposes a novel offline evaluation and optimization method for chain-of-thought reasoning in LLMs using knowledge graphs and a KG-IPS estimator for unbiased feedback.
Findings
OCEAN effectively evaluates chain-of-thought reasoning paths.
It enables off-policy optimization to improve reasoning alignment.
The method maintains LLMs' general abilities in downstream tasks.
Abstract
Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and reasoning paths, we use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts. Due to the heterogeneity between LLM reasoning and KG structures, direct interaction and feedback from KGs on LLM behavior are challenging, as they require accurate entity linking and grounding of LLM-generated chains of thought in the KG. To address the above challenge, we propose an offline chain-of-thought evaluation framework, OCEAN, which models chain-of-thought reasoning in LLMs as an MDP and evaluate the policy's alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsFocus
