A visual question answering method based on task decomposition

Yao Cong; Hongwei Mo

PMC · DOI:10.1371/journal.pone.0336623·November 13, 2025

A visual question answering method based on task decomposition

Yao Cong, Hongwei Mo

PDF

Open Access

TL;DR

This paper introduces a new visual question answering method that improves accuracy and reduces bias by decomposing tasks using natural language structure.

Contribution

The novel Graph2Seq-TDN network uses semantic structure to enhance task decomposition and reasoning execution in VQA.

Findings

01

The proposed Graph2Seq-TDN outperforms existing methods in answering accuracy and program accuracy.

02

The model reduces training costs while maintaining the same level of accuracy.

03

Validation on four datasets shows improved performance over comparative models.

Abstract

Visual question answering (VQA) as an interdisciplinary task of computer vision and natural language processing, estimating the model’s visual reasoning ability, which requires the integration of image information extraction technology and natural language understanding technology. The testing on professional benchmark which controls the potential bias states that the VQA method based on task decomposition is a promising approach, offering advantages in interpretability at program execution stage and reducing data bias dependencies, compared with traditional VQA methods that only rely on multimodal fusion. The VQA method based on task decomposition decomposes the task by parsing natural language and it usually parses the language with sequence-to-sequence networks. It has limitations when faced with flexible and varied natural language, making it difficult to accurately decompose the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures42

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling