Phi-4-reasoning-vision-15B Technical Report
Jyoti Aneja, Michael Harrison, Neel Joshi, Tyler LaBonte, John Langford, Eduardo Salinas

TL;DR
This paper introduces Phi-4-reasoning-vision-15B, a compact multimodal reasoning model that achieves strong performance through careful architecture and data curation, emphasizing efficiency and high-quality reasoning.
Contribution
It demonstrates that strategic design choices and rigorous data filtering enable smaller models to perform competitively on vision and reasoning tasks, with open weights for community use.
Findings
Data quality is crucial for model performance.
High-resolution encoders improve perception and reasoning.
Hybrid training enables fast answers and complex reasoning.
Abstract
We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building smaller, efficient multimodal reasoning models and to share the result of these learnings as an open-weight model that is good at common vision and language tasks and excels at scientific and mathematical reasoning and understanding user interfaces. Our contributions include demonstrating that careful architecture choices and rigorous data curation enable smaller, open-weight multimodal models to achieve competitive performance with significantly less training and inference-time compute and tokens. The most substantial improvements come from systematic filtering, error correction, and synthetic augmentation --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
