Federated Document Visual Question Answering: A Pilot Study

Khanh Nguyen; Dimosthenis Karatzas

arXiv:2405.06636·cs.CV·May 24, 2024

Federated Document Visual Question Answering: A Pilot Study

Khanh Nguyen, Dimosthenis Karatzas

PDF

Open Access 1 Repo

TL;DR

This paper investigates federated learning for Document Visual Question Answering (VQA), demonstrating how privacy-preserving, decentralized training over heterogeneous datasets can improve model performance and scalability in real-world applications.

Contribution

It introduces a federated training approach for DocVQA, combining self-pretraining with federated optimization, and provides extensive analysis on its effectiveness and hyperparameter tuning.

Findings

01

Federated DocVQA training outperforms FedAvg baseline.

02

Self-pretraining enhances model performance in federated settings.

03

Hyperparameter tuning is crucial for practical federated document tasks.

Abstract

An important handicap of document analysis research is that documents tend to be copyrighted or contain private information, which prohibits their open publication and the creation of centralised, large-scale document datasets. Instead, documents are scattered in private data silos, making extensive training over heterogeneous data a tedious task. In this work, we explore the use of a federated learning (FL) scheme as a way to train a shared model on decentralised private document data. We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains. Enabling training over heterogeneous document datasets can thus substantially enrich DocVQA models. We assemble existing DocVQA datasets from diverse domains to reflect the data heterogeneity in real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khanhnguyen21006/fldocvqa
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies

MethodsFocus