X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying, Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung, Poon

TL;DR
X-Reasoner demonstrates that general-domain text-based post-training can enable strong, transferable reasoning capabilities across modalities and domains, including medical applications, outperforming existing models.
Contribution
The paper introduces X-Reasoner, a novel vision-language model trained solely on general text, showing effective reasoning transfer to multimodal and out-of-domain tasks, and further enhances performance with domain-specific training.
Findings
X-Reasoner outperforms state-of-the-art models on various benchmarks.
Post-training on general text enables cross-modal reasoning.
Domain-specific fine-tuning improves specialized task performance.
Abstract
Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General-domain text-based post-training can enable such strong generalizable reasoning. Leveraging this finding, we introduce X-Reasoner, a vision-language model post-trained solely on general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
