Improving Large Vision-Language Models' Understanding for Flow Field Data

Xiaomei Zhang; Hanyu Zheng; Xiangyu Zhu; Jinghuan Wei; Junhong Zou; Zhen Lei; Zhaoxiang Zhang

arXiv:2507.18311·cs.CV·March 11, 2026

Improving Large Vision-Language Models' Understanding for Flow Field Data

Xiaomei Zhang, Hanyu Zheng, Xiangyu Zhu, Jinghuan Wei, Junhong Zou, Zhen Lei, Zhaoxiang Zhang

PDF

Open Access

TL;DR

This paper introduces FieldLVLM, a framework that enhances large vision-language models' understanding of complex scientific field data by extracting key features and optimizing model tuning, leading to improved performance on scientific datasets.

Contribution

The paper presents a novel framework combining field-aware language generation and data compression for tuning LVLMs on scientific data, addressing a gap in applying these models to natural science domains.

Findings

01

FieldLVLM outperforms existing methods on scientific field data benchmarks.

02

The approach effectively extracts and encodes key physical features from complex data.

03

Experimental results demonstrate improved understanding of flow field data by LVLMs.

Abstract

Large Vision-Language Models (LVLMs) have shown impressive capabilities across a range of tasks that integrate visual and textual understanding, such as image captioning and visual question answering. These models are trained on large-scale image and video datasets paired with text, enabling them to bridge visual perception and natural language processing. However, their application to scientific domains, especially in interpreting complex field data commonly used in the natural sciences, remains underexplored. In this work, we introduce FieldLVLM, a novel framework designed to improve large vision-language models' understanding of field data. FieldLVLM consists of two main components: a field-aware language generation strategy and a data-compressed multimodal model tuning. The field-aware language generation strategy leverages a special-purpose machine learning pipeline to extract key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Geographic Information Systems Studies · Semantic Web and Ontologies