XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng, Hang Zhang, Jian Yang, Xiang Li, Weixiao Zhou, Fei Liu,, Kui Wu, Xiangyuan Guan, Tao Sun, Xianjie Wu, Tongliang Li, Zhoujun Li

TL;DR
XFormParser is a novel multimodal and multilingual form parser that combines semantic entity recognition and relation extraction within a Transformer-based framework, significantly improving industrial form parsing accuracy and robustness.
Contribution
The paper introduces XFormParser, a new semi-structured form parser that integrates SER and RE, enhances multilingual parsing with Bi-LSTM, and provides a new industrial dataset InDFormSFT.
Findings
Achieves up to 1.79% F1 score improvement over SOTA in RE tasks.
Demonstrates strong performance in multilingual and zero-shot settings.
Proves robustness across various industrial form parsing benchmarks.
Abstract
In the domain of Document AI, parsing semi-structured image form is a crucial Key Information Extraction (KIE) task. The advent of pre-trained multimodal models significantly empowers Document AI frameworks to extract key information from form documents in different formats such as PDF, Word, and images. Nonetheless, form parsing is still encumbered by notable challenges like subpar capabilities in multilingual parsing and diminished recall in industrial contexts in rich text and rich visuals. In this work, we introduce a simple but effective \textbf{M}ultimodal and \textbf{M}ultilingual semi-structured \textbf{FORM} \textbf{PARSER} (\textbf{XFormParser}), which anchored on a comprehensive Transformer-based pre-trained language model and innovatively amalgamates semantic entity recognition (SER) and relation extraction (RE) into a unified framework. Combined with Bi-LSTM, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis
