AEGIS: Exploring the Limit of World Knowledge Capabilities for Unified Mulitmodal Models

Jintao Lin; Bowen Dong; Weikang Shi; Chenyang Lei; Suiyun Zhang; Rui Liu; Xihui Liu

arXiv:2601.00561·cs.CV·January 5, 2026

AEGIS: Exploring the Limit of World Knowledge Capabilities for Unified Mulitmodal Models

Jintao Lin, Bowen Dong, Weikang Shi, Chenyang Lei, Suiyun Zhang, Rui Liu, Xihui Liu

PDF

Open Access 1 Datasets

TL;DR

This paper introduces AEGIS, a comprehensive benchmark for evaluating unified multimodal models' world knowledge across diverse tasks, revealing significant knowledge gaps and the potential of reasoning modules to improve performance.

Contribution

The paper presents AEGIS, a new multi-task benchmark with a novel deterministic evaluation protocol to better assess world knowledge in multimodal models.

Findings

01

Most UMMs show significant world knowledge deficits.

02

Performance drops with complex reasoning tasks.

03

Simple reasoning modules can partially improve UMMs.

Abstract

The capability of Unified Multimodal Models (UMMs) to apply world knowledge across diverse tasks remains a critical, unresolved challenge. Existing benchmarks fall short, offering only siloed, single-task evaluations with limited diagnostic power. To bridge this gap, we propose AEGIS (\emph{i.e.}, \textbf{A}ssessing \textbf{E}diting, \textbf{G}eneration, \textbf{I}nterpretation-Understanding for \textbf{S}uper-intelligence), a comprehensive multi-task benchmark covering visual understanding, generation, editing, and interleaved generation. AEGIS comprises 1,050 challenging, manually-annotated questions spanning 21 topics (including STEM, humanities, daily life, etc.) and 6 reasoning types. To concretely evaluate the performance of UMMs in world knowledge scope without ambiguous metrics, we further propose Deterministic Checklist-based Evaluation (DCE), a protocol that replaces ambiguous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

DongSky/AEGIS
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)