Federated Document Visual Question Answering: A Pilot Study
Khanh Nguyen, Dimosthenis Karatzas

TL;DR
This paper investigates federated learning for Document Visual Question Answering (VQA), demonstrating how privacy-preserving, decentralized training over heterogeneous datasets can improve model performance and scalability in real-world applications.
Contribution
It introduces a federated training approach for DocVQA, combining self-pretraining with federated optimization, and provides extensive analysis on its effectiveness and hyperparameter tuning.
Findings
Federated DocVQA training outperforms FedAvg baseline.
Self-pretraining enhances model performance in federated settings.
Hyperparameter tuning is crucial for practical federated document tasks.
Abstract
An important handicap of document analysis research is that documents tend to be copyrighted or contain private information, which prohibits their open publication and the creation of centralised, large-scale document datasets. Instead, documents are scattered in private data silos, making extensive training over heterogeneous data a tedious task. In this work, we explore the use of a federated learning (FL) scheme as a way to train a shared model on decentralised private document data. We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains. Enabling training over heterogeneous document datasets can thus substantially enrich DocVQA models. We assemble existing DocVQA datasets from diverse domains to reflect the data heterogeneity in real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies
MethodsFocus
