Better Supervised Fine-tuning for VQA: Integer-Only Loss

Baihong Qian; Haotian Fan; Wenjie Liao; Yunqiu Wang; Tao Li; and Junhui Cui

arXiv:2508.11170·cs.CV·August 18, 2025

Better Supervised Fine-tuning for VQA: Integer-Only Loss

Baihong Qian, Haotian Fan, Wenjie Liao, Yunqiu Wang, Tao Li, and Junhui Cui

PDF

TL;DR

This paper introduces IOVQA, a novel fine-tuning method for vision language models that uses integer-only labels and a targeted loss to improve video quality assessment accuracy and consistency.

Contribution

The paper presents a new integer-only label construction and loss calculation strategy for fine-tuning VLMs, enhancing their performance in quantitative evaluation tasks.

Findings

01

Achieved 3rd place in VQualA 2025 GenAI-Bench challenge.

02

Significantly improved VQA accuracy and consistency.

03

Demonstrated effectiveness of integer-only labels in fine-tuning.

Abstract

With the rapid advancement of vision language models(VLM), their ability to assess visual content based on specific criteria and dimensions has become increasingly critical for applications such as video-theme consistency assessment and visual quality scoring. However, existing methods often suffer from imprecise results and inefficient loss calculation, which limit the focus of the model on key evaluation indicators. To address this, we propose IOVQA(Integer-only VQA), a novel fine-tuning approach tailored for VLMs to enhance their performance in video quality assessment tasks. The key innovation of IOVQA lies in its label construction and its targeted loss calculation mechanism. Specifically, during dataset curation, we constrain the model's output to integers within the range of [10,50], ensuring numerical stability, and convert decimal Overall_MOS to integer before using them as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.