TL;DR
This paper introduces MMM-Bench, a comprehensive benchmark for multi-level, multi-domain, multi-modal document classification, featuring hierarchical taxonomy and real-world data from Alibaba to advance industrial document understanding.
Contribution
The paper presents the first benchmark with a hierarchical taxonomy and multi-domain data, addressing limitations of existing simplified document classification benchmarks.
Findings
Established baseline models and API-based approaches on MMM-Bench.
Identified four fundamental challenges in multi-level, multi-domain document classification.
Provided insights to guide future research in complex document understanding.
Abstract
Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
