Chain-of-Action: Faithful and Multimodal Question Answering through   Large Language Models

Zhenyu Pan; Haozheng Luo; Manling Li; Han Liu

arXiv:2403.17359·cs.CL·February 24, 2025·2 cites

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu

PDF

Open Access 1 Repo

TL;DR

The paper introduces Chain-of-Action, a framework that improves multimodal question answering by ensuring faithfulness and better reasoning through a novel retrieval mechanism and systematic prompting.

Contribution

It proposes a new reasoning-retrieval method with plug-and-play actions and a faith score to enhance accuracy and faithfulness in multimodal QA tasks.

Findings

01

Outperforms existing methods on public benchmarks.

02

Effectively retrieves real-time information from heterogeneous sources.

03

Improves reasoning over compositional questions.

Abstract

We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MAGICS-LAB/Chain-of-Actions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques