Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
Riddhi Jain, Manasi Patwardhan, Parijat Deshpande, Venkataramana Runkana

TL;DR
This paper introduces a multi-step reasoning approach for Indian food visual question answering, leveraging reasoning chains and reinforcement learning to improve accuracy over existing methods.
Contribution
It proposes a novel reasoning chain-based framework for Indian food VQA, enhancing accuracy by integrating multi-step reasoning with minimal human intervention.
Findings
Accuracy improved by 10 percentage points with reasoning chains.
Reinforcement learning further enhances model performance.
Analysis shows reasoning chains effectively handle complex culinary contexts.
Abstract
The immense diversity in the culture and culinary of Indian cuisines calls attention to the major shortcoming of the existing Visual Question Answering(VQA) systems which are inclined towards the foods from Western region. Recent attempt towards building a VQA dataset for Indian food is a step towards addressing this challenge. However, their approach towards VQA follows a two-step process in which the answer is generated first, followed by the explanation of the expected answer. In this work, we claim that food VQA requires to follow a multi-step reasoning process to arrive at an accurate answer, especially in the context of India food, which involves understanding complex culinary context and identifying relationships between various food items. With this hypothesis we create reasoning chains upon the QA with minimal human intervention. We fine-tune smaller LLMs and VLMs with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks
