mForms : Multimodal Form-Filling with Question Answering
Larry Heck, Simon Heck, Anirudh Sundar

TL;DR
This paper introduces mForms, a zero-shot multimodal form-filling method using question answering models, achieving high accuracy with minimal training data and providing a new dataset for future research.
Contribution
It reformulates form-filling as multimodal question answering, enabling zero-shot performance and introducing a new dataset for multimodal form-filling tasks.
Findings
Achieves state-of-the-art F1 of 0.97 on ATIS with limited training data.
Maintains robust accuracy in sparse training conditions.
Introduces a new multimodal form-filling dataset, mForms.
Abstract
This paper presents a new approach to form-filling by reformulating the task as multimodal natural language Question Answering (QA). The reformulation is achieved by first translating the elements on the GUI form (text fields, buttons, icons, etc.) to natural language questions, where these questions capture the element's multimodal semantics. After a match is determined between the form element (Question) and the user utterance (Answer), the form element is filled through a pre-trained extractive QA system. By leveraging pre-trained QA models and not requiring form-specific training, this approach to form-filling is zero-shot. The paper also presents an approach to further refine the form-filling by using multi-task training to incorporate a potentially large number of successive tasks. Finally, the paper introduces a multimodal natural language form-filling dataset Multimodal Forms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
