What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions?

Yizhe Zhang; Richard Bai; Zijin Gu; Ruixiang Zhang; Jiatao Gu; Emmanuel Abbe; Samy Bengio; Navdeep Jaitly

arXiv:2502.18435·cs.CL·July 1, 2025

What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions?

Yizhe Zhang, Richard Bai, Zijin Gu, Ruixiang Zhang, Jiatao Gu, Emmanuel Abbe, Samy Bengio, Navdeep Jaitly

PDF

Open Access 1 Models

TL;DR

This paper explores alternative text generation orderings in language models, demonstrating that right-to-left training can outperform traditional left-to-right methods on multiple-choice question benchmarks, with insights into underlying factors.

Contribution

It introduces and empirically evaluates right-to-left training for LLMs, revealing its advantages over left-to-right approaches in reasoning tasks and providing theoretical analysis of the factors involved.

Findings

01

R2L models outperform L2R models on several MCQ benchmarks

02

Performance differences linked to calibration, computability, and entropy

03

Controlled experiments with arithmetic tasks disentangle influencing factors

Abstract

Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that R2L models can significantly outperform L2R models on several MCQ benchmarks, including logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability, and directional conditional entropy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
apple/ml-reversal-blessing
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Multimodal Machine Learning Applications