FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
Bobo Li, Yuheng Wang, Hao Fei, Juncheng Li, Wei Ji, Mong-Li Lee, Wynne Hsu

TL;DR
FormFactory introduces a comprehensive benchmark suite for evaluating multimodal large language models on the complex task of automated form filling, highlighting current models' limitations and guiding future research.
Contribution
We define the form-filling task formally and develop a benchmark with a web interface, dataset, and evaluation tools to assess model performance on real-world scenarios.
Findings
No model exceeds 5% accuracy on the benchmark
Current models struggle with visual layout reasoning
Field-value alignment remains a significant challenge
Abstract
Online form filling is a common yet labor-intensive task involving extensive keyboard and mouse interactions. Despite the long-standing vision of automating this process with "one click", existing tools remain largely rule-based and lack generalizable, generative capabilities. Recent advances in Multimodal Large Language Models (MLLMs) have enabled promising agents for GUI-related tasks in general-purpose scenarios. However, they struggle with the unique challenges of form filling, such as flexible layouts and the difficulty of aligning textual instructions with on-screen fields. To bridge this gap, we formally define the form-filling task and propose FormFactory, an interactive benchmarking suite comprising a web-based interface, backend evaluation module, and carefully constructed dataset. Our benchmark covers diverse real-world scenarios, incorporates various field formats, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗billyenrizky/FlowVFE-39M-SFTmodel
- 🤗billyenrizky/FlowVFE-39M-FlowGRPOmodel
- 🤗billyenrizky/FS-DFM-1.3B-SFTmodel
- 🤗billyenrizky/ReFusion-8B-SFTmodel
- 🤗billyenrizky/ReFusion-8B-ESPOmodel
- 🤗billyenrizky/FS-DFM-1.3B-FlowGRPOmodel
- 🤗billyenrizky/Qwen3-8B-FormFactory-SFT-LoRAmodel· 39 dl39 dl
- 🤗billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRAmodel· 37 dl37 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · BIM and Construction Integration · Design Education and Practice
