Megrez-Omni Technical Report

Boxun Li; Yadong Li; Zhiyuan Li; Congyi Liu; Weilin Liu; Guowei Niu,; Zheyue Tan; Haiyang Xu; Zhuyu Yao; Tao Yuan; Dong Zhou; Yueqing Zhuang,; Shengen Yan; Guohao Dai; Yu Wang

arXiv:2502.15803·cs.LG·February 27, 2025

Megrez-Omni Technical Report

Boxun Li, Yadong Li, Zhiyuan Li, Congyi Liu, Weilin Liu, Guowei Niu,, Zheyue Tan, Haiyang Xu, Zhuyu Yao, Tao Yuan, Dong Zhou, Yueqing Zhuang,, Shengen Yan, Guohao Dai, Yu Wang

PDF

Open Access

TL;DR

This paper introduces Megrez models, including a language and multimodal model, optimized for fast, accurate, and robust edge-side AI applications across text, image, and audio modalities.

Contribution

The paper presents the Megrez-3B-Omni multimodal model, achieving state-of-the-art accuracy and robustness for on-device AI across multiple modalities, with a focus on software-hardware co-design.

Findings

01

Megrez-3B-Omni achieves state-of-the-art multimodal accuracy.

02

The models are optimized for fast inference and edge deployment.

03

Demonstrates versatility across text, image, and audio analysis.

Abstract

In this work, we present the Megrez models, comprising a language model (Megrez-3B-Instruct) and a multimodal model (Megrez-3B-Omni). These models are designed to deliver fast inference, compactness, and robust edge-side intelligence through a software-hardware co-design approach. Megrez-3B-Instruct offers several advantages, including high accuracy, high speed, ease of use, and a wide range of applications. Building on Megrez-3B-Instruct, Megrez-3B-Omni is an on-device multimodal understanding LLM that supports image, text, and audio analysis. It achieves state-of-the-art accuracy across all three modalities and demonstrates strong versatility and robustness, setting a new benchmark for multimodal AI models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech Recognition and Synthesis · Natural Language Processing Techniques