Phoenix-VL 1.5 Medium Technical Report

Team Phoenix: Arka Ray; Askar Ali Mohamed Jawad; Biondi Lee; Elijah Seah; Eva Lim; Fiona Teo; Grace Toh; Guang Xiang Teo; Jun En Tan; Jia Hui Bong; Jiale Wang; Jonathan Ng; Justin Tan; Kai Zhe Yew; Matthew Ong; Shun Yi Yeo; Wen Jett Lam; Wen Xiu Tan; Ze Yu Zhang; Gee Wah Ng; Chee Wee Ang; Mistral AI: Adrien Sad\'e; Guillaume Kunsch; Jia Sin Loh; Nicolas Schuhl; Rupert Menneer; Umar Jamil; Vincent Maladi\`ere; Yimu Pan

arXiv:2605.10391·cs.CL·May 12, 2026

Phoenix-VL 1.5 Medium Technical Report

Team Phoenix: Arka Ray, Askar Ali Mohamed Jawad, Biondi Lee, Elijah Seah, Eva Lim, Fiona Teo, Grace Toh, Guang Xiang Teo, Jun En Tan, Jia Hui Bong, Jiale Wang, Jonathan Ng, Justin Tan, Kai Zhe Yew, Matthew Ong, Shun Yi Yeo, Wen Jett Lam, Wen Xiu Tan, Ze Yu Zhang, Gee Wah Ng

PDF

TL;DR

Phoenix-VL 1.5 Medium is a large, multimodal, multilingual foundation model tailored for Singapore, demonstrating strong performance on local benchmarks and maintaining broad-spectrum intelligence.

Contribution

It introduces a regionally adapted, multimodal, multilingual foundation model with novel training and evaluation methods for localized knowledge and safety.

Findings

01

Achieves state-of-the-art results on Singapore-specific benchmarks.

02

Maintains competitive performance on global multimodal and multilingual tasks.

03

Introduces a new evaluation suite for localized knowledge and safety.

Abstract

We introduce Phoenix-VL 1.5 Medium, a 123B-parameter natively multimodal and multilingual foundation model, adapted to regional languages and the Singapore context. Developed as a sovereign AI asset, it demonstrates that deep domain adaptation can be achieved with minimal degradation to broad-spectrum intelligence and alignment. Continued pretraining was performed on Mistral Medium 3.1 using a localized 1-trillion tokens multimodal corpus, followed by a 250-billion tokens long-context extension phase. Subsequent post-training incorporated a novel human-annotated Singapore multimodal dataset and curated textual corpus on Singapore culture, knowledge, and legislation, totaling 22-billion tokens. An additional 5 billion tokens of model alignment was performed through Online Direct Preference Optimization. Phoenix-VL 1.5 Medium achieves state-of-the-art performance for its size on Singapore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.