Enhancing Vision Language Models with Logic Reasoning for Situational Awareness

Pavana Pradeep; Krishna Kant; Suya Yu

arXiv:2601.11322·cs.CV·January 19, 2026

Enhancing Vision Language Models with Logic Reasoning for Situational Awareness

Pavana Pradeep, Krishna Kant, Suya Yu

PDF

Open Access

TL;DR

This paper introduces a method that combines vision-language models with logic reasoning to improve situational awareness by extracting detailed event information, enhancing accuracy through intelligent fine-tuning, and providing justifications for outputs.

Contribution

It presents an integrated approach that enhances VLMs with explicit logic reasoning, improving event detail extraction, accuracy, and interpretability in situational awareness tasks.

Findings

01

Enhanced accuracy with intelligent fine-tuning

02

Improved extraction of fine-grained event details

03

Generated justifications increase interpretability

Abstract

Vision-Language Models (VLMs) offer the ability to generate high-level, interpretable descriptions of complex activities from images and videos, making them valuable for situational awareness (SA) applications. In such settings, the focus is on identifying infrequent but significant events with high reliability and accuracy, while also extracting fine-grained details and assessing recognition quality. In this paper, we propose an approach that integrates VLMs with traditional computer vision methods through explicit logic reasoning to enhance SA in three key ways: (a) extracting fine-grained event details, (b) employing an intelligent fine-tuning (FT) strategy that achieves substantially higher accuracy than uninformed selection, and (c) generating justifications for VLM outputs during inference. We demonstrate that our intelligent FT mechanism improves the accuracy and provides a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning