PRefLexOR: Preference-based Recursive Language Modeling for Exploratory   Optimization of Reasoning and Agentic Thinking

Markus J. Buehler

arXiv:2410.12375·cs.AI·October 17, 2024·2 cites

PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking

Markus J. Buehler

PDF

Open Access 1 Repo 5 Models

TL;DR

PRefLexOR introduces a recursive, preference-based learning framework that enables small language models to iteratively improve reasoning and decision-making through self-teaching, reflection, and dynamic knowledge graph construction.

Contribution

It presents a novel recursive learning method combining preference optimization and reinforcement learning concepts for enhanced reasoning in small language models.

Findings

01

Small models (3B parameters) can self-improve reasoning depth.

02

Recursive optimization enhances coherence and consistency.

03

Method applicable across diverse domains, including biological materials science.

Abstract

PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning) combines preference optimization with concepts from Reinforcement Learning to enable models to self-teach through iterative reasoning improvements. We propose a recursive learning approach that engages the model in multi-step reasoning, revisiting, and refining intermediate steps before producing a final output in training and inference phases. Through multiple training stages, the model first learns to align its reasoning with accurate decision paths by optimizing the log odds between preferred and non-preferred responses. During this process, PRefLexOR builds a dynamic knowledge graph by generating questions from random text chunks and retrieval-augmentation to contextualize relevant details from the entire training corpus. In the second stage, preference optimization enhances model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lamm-mit/PRefLexOR
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus · ALIGN