Logics-Parsing-Omni Technical Report

Xin An; Jingyi Cai; Xiangyang Chen; Huayao Liu; Peiting Liu; Peng Wang; Bei Yang; Xiuwen Zhu; Yongfan Chen; Yan Gao; Yuan Gao; Baoyu Hou; Guangzheng Hu; Shuzhao Li; Weixu Qiao; Weidong Ren; Yanan Wang; Boyu Yang; Fan Yang; Jiangtao Zhang; Lixin Zhang; Lin Qu; Hu Wei; Xiaoxiao Xu; Bing Zhao

arXiv:2603.09677·cs.AI·April 9, 2026

Logics-Parsing-Omni Technical Report

Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang, Bei Yang, Xiuwen Zhu, Yongfan Chen, Yan Gao, Yuan Gao, Baoyu Hou, Guangzheng Hu, Shuzhao Li, Weixu Qiao, Weidong Ren, Yanan Wang, Boyu Yang, Fan Yang, Jiangtao Zhang, Lixin Zhang, Lin Qu, Hu Wei, Xiaoxiao Xu

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces the Omni Parsing framework for multimodal data, integrating perception and cognition through hierarchical levels, evidence anchoring, and a new benchmark, to convert unstructured signals into structured knowledge.

Contribution

It presents a novel hierarchical parsing framework with evidence anchoring and releases a standardized dataset, model, and benchmark for multimodal structured knowledge extraction.

Findings

01

The framework effectively grounds objects and events spatially and temporally.

02

Fine-grained recognition improves structured entity parsing.

03

High-level reasoning enhances model reliability.

Abstract

Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams, introducing a progressive parsing paradigm that bridges perception and cognition. Specifically, the framework integrates three hierarchical levels: 1) Holistic Detection, which achieves precise spatial-temporal grounding of objects or events to establish a geometric baseline for perception; 2) Fine-grained Recognition, which performs symbolization (e.g., OCR/ASR) and attribute extraction on localized objects to complete structured entity parsing; and 3) Multi-level Interpreting, which constructs a reasoning chain from local semantics to global logic. A pivotal advantage of this framework is its evidence anchoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba/Logics-Parsing/tree/master/Logics-Parsing-Omni
github

Models

🤗
Logics-MLLM/Logics-Parsing-Omni
model· 90 dl· ♡ 10
90 dl♡ 10

Datasets

Logics-MLLM/OmniParsingBench
dataset· 3.4k dl
3.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.