OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large   Language Models

Junda Wu; Xintong Li; Ruoyu Wang; Yu Xia; Yuxin Xiong; Jianing Wang,; Tong Yu; Xiang Chen; Branislav Kveton; Lina Yao; Jingbo Shang; Julian McAuley

arXiv:2410.23703·cs.LG·November 1, 2024

OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models

Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang,, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley

PDF

Open Access

TL;DR

This paper introduces OCEAN, an offline evaluation framework for large language models' chain-of-thought reasoning, utilizing knowledge graphs and reinforcement learning to improve reasoning alignment without retraining the models.

Contribution

The paper proposes a novel offline evaluation and optimization method for chain-of-thought reasoning in LLMs using knowledge graphs and a KG-IPS estimator for unbiased feedback.

Findings

01

OCEAN effectively evaluates chain-of-thought reasoning paths.

02

It enables off-policy optimization to improve reasoning alignment.

03

The method maintains LLMs' general abilities in downstream tasks.

Abstract

Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and reasoning paths, we use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts. Due to the heterogeneity between LLM reasoning and KG structures, direct interaction and feedback from KGs on LLM behavior are challenging, as they require accurate entity linking and grounding of LLM-generated chains of thought in the KG. To address the above challenge, we propose an offline chain-of-thought evaluation framework, OCEAN, which models chain-of-thought reasoning in LLMs as an MDP and evaluate the policy's alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus