TL;DR
This paper explores neuro-symbolic language reasoning in vision-language models, achieving improved accuracy and efficiency in reasoning tasks using reinforcement learning and a specialized dataset.
Contribution
It introduces a neuro-symbolic reasoning approach in vision-language models, demonstrating enhanced accuracy and reduced reasoning tokens with a new training setup.
Findings
Achieved 3.33% accuracy improvement on vision-language tasks.
Reduced reasoning tokens by 75% compared to SymPy.
Documented compute challenges and scaling possibilities.
Abstract
There are 7,407 languages in the world. But, what about the languages that are not there in the world? Are humans so narrow minded that we don't care about the languages aliens communicate in? Aliens are humans too! In the 2016 movie Arrival, Amy Adams plays a linguist, Dr. Louise Banks who, by learning to think in an alien language (Heptapod) formed of non-sequential sentences, gains the ability to transcend time and look into the future. In this work, I aim to explore the representation and reasoning of vision-language concepts in a neuro-symbolic language, and study improvement in analytical reasoning abilities and efficiency of "thinking systems". With Qwen3-VL-2B-Instruct as base model and 4 Nvidia H200 GPU nodes, I achieve an accuracy improvement of 3.33\% on a vision-language evaluation dataset consisting of math, science, and general knowledge questions, while reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
