X-Reasoner: Towards Generalizable Reasoning Across Modalities and   Domains

Qianchu Liu; Sheng Zhang; Guanghui Qin; Timothy Ossowski; Yu Gu; Ying; Jin; Sid Kiblawi; Sam Preston; Mu Wei; Paul Vozila; Tristan Naumann; Hoifung; Poon

arXiv:2505.03981·cs.AI·May 9, 2025

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying, Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung, Poon

PDF

Open Access 1 Models

TL;DR

X-Reasoner demonstrates that general-domain text-based post-training can enable strong, transferable reasoning capabilities across modalities and domains, including medical applications, outperforming existing models.

Contribution

The paper introduces X-Reasoner, a novel vision-language model trained solely on general text, showing effective reasoning transfer to multimodal and out-of-domain tasks, and further enhances performance with domain-specific training.

Findings

01

X-Reasoner outperforms state-of-the-art models on various benchmarks.

02

Post-training on general text enables cross-modal reasoning.

03

Domain-specific fine-tuning improves specialized task performance.

Abstract

Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General-domain text-based post-training can enable such strong generalizable reasoning. Leveraging this finding, we introduce X-Reasoner, a vision-language model post-trained solely on general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
microsoft/X-Reasoner-7B
model· 91 dl· ♡ 9
91 dl♡ 9

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education