Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

Denghao Ma; Qing Liu; Zulong Chen; Chuanfei Xu; Jia Xu; Zhibo Yang; Wei Shao; Zhao Li

arXiv:2605.10550·cs.CL·May 15, 2026

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

Denghao Ma, Qing Liu, Zulong Chen, Chuanfei Xu, Jia Xu, Zhibo Yang, Wei Shao, Zhao Li

PDF

1 Repo

TL;DR

This paper introduces MMM-Bench, a comprehensive benchmark for multi-level, multi-domain, multi-modal document classification, featuring hierarchical taxonomy and real-world data from Alibaba to advance industrial document understanding.

Contribution

The paper presents the first benchmark with a hierarchical taxonomy and multi-domain data, addressing limitations of existing simplified document classification benchmarks.

Findings

01

Established baseline models and API-based approaches on MMM-Bench.

02

Identified four fundamental challenges in multi-level, multi-domain document classification.

03

Provided insights to guide future research in complex document understanding.

Abstract

Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MMMDC-Bench/MMMDC-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.