Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios

Manjunath Prasad Holenarasipura Rajiv; B. M. Vidyavathi

arXiv:2510.26580·cs.CV·October 31, 2025

Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios

Manjunath Prasad Holenarasipura Rajiv, B. M. Vidyavathi

PDF

TL;DR

This paper presents a novel framework that combines vision transformers and language models to enable zero-shot scene understanding in dynamic, real-world environments, significantly improving accuracy without prior training.

Contribution

It introduces a dynamic reasoning approach leveraging vision-language alignment for zero-shot scene understanding, addressing generalization in unseen environments.

Findings

01

Up to 18% improvement in scene understanding accuracy

02

Robust performance in cluttered and ambiguous scenes

03

Effective zero-shot generalization across multiple benchmarks

Abstract

In real-world environments, AI systems often face unfamiliar scenarios without labeled data, creating a major challenge for conventional scene understanding models. The inability to generalize across unseen contexts limits the deployment of vision-based applications in dynamic, unstructured settings. This work introduces a Dynamic Context-Aware Scene Reasoning framework that leverages Vision-Language Alignment to address zero-shot real-world scenarios. The goal is to enable intelligent systems to infer and adapt to new environments without prior task-specific training. The proposed approach integrates pre-trained vision transformers and large language models to align visual semantics with natural language descriptions, enhancing contextual comprehension. A dynamic reasoning module refines predictions by combining global scene cues and object-level interactions guided by linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.