DocFusion: A Unified Framework for Document Parsing Tasks

Mingxu Chai; Ziyu Shen; Chong Zhang; Yue Zhang; Xiao Wang; Shihan Dou; Jihua Kang; Jiazheng Zhang; Qi Zhang

arXiv:2412.12505·cs.CL·May 23, 2025

DocFusion: A Unified Framework for Document Parsing Tasks

Mingxu Chai, Ziyu Shen, Chong Zhang, Yue Zhang, Xiao Wang, Shihan Dou, Jihua Kang, Jiazheng Zhang, Qi Zhang

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

DocFusion is a lightweight, unified generative model that efficiently handles multiple document parsing tasks, achieving state-of-the-art results by leveraging task interactions and integrated training.

Contribution

It introduces a novel unified framework for document parsing that simplifies architecture and improves performance through collaborative training and task interaction modeling.

Findings

01

Achieves SOTA performance across four document parsing tasks

02

Leverages mutual benefits among recognition tasks

03

Significantly improves detection accuracy with integrated data

Abstract

Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications. However, existing methods often require integrating multiple independent models to handle various parsing tasks, leading to high complexity and maintenance overhead. To address this, we propose DocFusion, a lightweight generative model with only 0.28B parameters. It unifies task representations and achieves collaborative training through an improved objective function. Experiments reveal and leverage the mutually beneficial interaction among recognition tasks, and integrating recognition data significantly enhances detection performance. The final results demonstrate that DocFusion achieves state-of-the-art (SOTA) performance across four key tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sc22mc/DocFusion
pytorchOfficial

Models

🤗
sc22mc/DocFusion
model· 28 dl· ♡ 2
28 dl♡ 2

Datasets

sc22mc/DocLatex
dataset· 146 dl
146 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques