PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking
Markus J. Buehler

TL;DR
PRefLexOR introduces a recursive, preference-based learning framework that enables small language models to iteratively improve reasoning and decision-making through self-teaching, reflection, and dynamic knowledge graph construction.
Contribution
It presents a novel recursive learning method combining preference optimization and reinforcement learning concepts for enhanced reasoning in small language models.
Findings
Small models (3B parameters) can self-improve reasoning depth.
Recursive optimization enhances coherence and consistency.
Method applicable across diverse domains, including biological materials science.
Abstract
PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning) combines preference optimization with concepts from Reinforcement Learning to enable models to self-teach through iterative reasoning improvements. We propose a recursive learning approach that engages the model in multi-step reasoning, revisiting, and refining intermediate steps before producing a final output in training and inference phases. Through multiple training stages, the model first learns to align its reasoning with accurate decision paths by optimizing the log odds between preferred and non-preferred responses. During this process, PRefLexOR builds a dynamic knowledge graph by generating questions from random text chunks and retrieval-augmentation to contextualize relevant details from the entire training corpus. In the second stage, preference optimization enhances model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗lamm-mit/meta-llama-Meta-Llama-3.2-3B-Instruct-Reasoning-Tokenizermodel
- 🤗lamm-mit/PRefLexOR_ORPO_DPO_EXO_REFLECT_10222024model· 5 dl· ♡ 35 dl♡ 3
- 🤗lamm-mit/PRefLexOR_ORPO_DPO_EXO_10242024model· 5 dl5 dl
- 🤗RichardErkhov/lamm-mit_-_PRefLexOR_ORPO_DPO_EXO_10242024-ggufmodel· 679 dl679 dl
- 🤗RichardErkhov/lamm-mit_-_PRefLexOR_ORPO_DPO_EXO_REFLECT_10222024-ggufmodel· 35 dl35 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsFocus · ALIGN
