Fine-tuning a vision-language model for fracture-surface morphology recognition

Quanliang Liu; Jungtaek Kim; Kangwook Lee; Hyunseok Oh

arXiv:2605.07145·cond-mat.mtrl-sci·May 11, 2026

Fine-tuning a vision-language model for fracture-surface morphology recognition

Quanliang Liu, Jungtaek Kim, Kangwook Lee, Hyunseok Oh

PDF

TL;DR

This paper fine-tunes an open-source vision-language model for fracture-surface morphology recognition, achieving high precision and demonstrating the value of domain-specific adaptation for materials analysis.

Contribution

It introduces a specialized dataset and fine-tuning approach that significantly improves fracture-surface image recognition over general-purpose models.

Findings

01

The fine-tuned model achieves a precision of 0.92.

02

Manual data collection and rotation augmentation improve recognition of rare features.

03

The approach enables better integration with autonomous microscopy workflows.

Abstract

Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-source VLM (Qwen3-VL-32B-Instruct) for fracture-surface image analysis using a curated dataset of 13,168 open-source, literature-mined fracture-surface images. Morphology annotations were generated by GPT-5.2-Reasoning (high) from both the images and relevant excerpts of their source papers, and the dataset was further enriched with targeted manual collection and rotation-based augmentation. The resulting specialist model outperforms flagship proprietary multimodal models on a benchmark of 100 manually annotated images. It achieves a precision of 0.92, compared to 0.35 for the base Qwen3-VL-32B-Instruct, 0.58 for GPT-5.5-Reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.