Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Ayush Singh, Mansi Gupta, Shivank Garg, Abhinav Kumar, Vansh Agrawal

TL;DR
This paper investigates the limitations of current vision-language models in mathematical reasoning tasks and proposes task-specific prompting as a more effective alternative to captioning pipelines for improving performance.
Contribution
It introduces task-based prompting tailored for math-related visual reasoning tasks, demonstrating its superiority over captioning pipelines in VLMs.
Findings
Captioning pipelines do not generalize well to math tasks.
Larger VLMs perform poorly on geometry and algebra challenges.
Task-specific prompting improves VLM performance on math reasoning tasks.
Abstract
Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving geometric reasoning, algebraic problem-solving, and counting. These limitations stem from difficulties effectively integrating multiple modalities and accurately interpreting geometry-related tasks. Various works claim that introducing a captioning pipeline before VQA tasks enhances performance. We incorporated this pipeline for tasks involving geometry, algebra, and counting. We found that captioning results are not generalizable, specifically with larger VLMs primarily trained on downstream QnA tasks showing random performance on math-related challenges. However, we present a promising alternative: task-based prompting, enriching the prompt with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · AI-based Problem Solving and Planning · Constraint Satisfaction and Optimization
