Self-Training Meets Consistency: Improving LLMs' Reasoning with   Consistency-Driven Rationale Evaluation

Jaehyeok Lee; Keisuke Sakaguchi; JinYeong Bak

arXiv:2411.06387·cs.LG·February 7, 2025

Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale Evaluation

Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak

PDF

Open Access 1 Repo 1 Video

TL;DR

CREST enhances large language models' reasoning by evaluating and filtering rationales through follow-up questions, leading to more robust and correct reasoning capabilities.

Contribution

It introduces a novel framework that evaluates rationales via follow-up questions and uses this to improve self-training of LLMs.

Findings

01

Improves logical robustness of rationales

02

Enhances reasoning accuracy over previous methods

03

Effective across multiple question-answering datasets

Abstract

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. However, a single measure risks misjudging rationale quality, leading the models to learn flawed reasoning patterns. To address this issue, we propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions and leverages this evaluation to guide its training. Specifically, we introduce two methods: (1) filtering out rationales that frequently result in incorrect answers on follow-up questions and (2) preference learning based on mixed preferences from rationale evaluation results of both original and follow-up questions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaehyeoklee-119/crest
pytorchOfficial

Videos

Self-Training Meets Consistency: Improving LLMs’ Reasoning with Consistency-Driven Rationale Evaluation· underline

Taxonomy

TopicsArtificial Intelligence in Law