Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning

Fuyu Dong; Ke Li; Di Wang; Nan Luo; Yiming Zhang; Kaiyu Li; Jianfei Yang; Quan Wang

arXiv:2512.24591·cs.CV·January 1, 2026

Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning

Fuyu Dong, Ke Li, Di Wang, Nan Luo, Yiming Zhang, Kaiyu Li, Jianfei Yang, Quan Wang

PDF

Open Access

TL;DR

This paper introduces DARFT, a reinforcement fine-tuning framework that enhances change detection visual question answering by explicitly addressing decision ambiguity, leading to improved discriminability and robustness especially in few-shot scenarios.

Contribution

The paper proposes a novel reinforcement fine-tuning method that targets decision ambiguity in CDVQA, improving model performance without extra supervision.

Findings

01

DARFT outperforms supervised fine-tuning baselines.

02

Significant improvements in few-shot learning scenarios.

03

Effective suppression of distractors and sharper decision boundaries.

Abstract

Change detection visual question answering (CDVQA) requires answering text queries by reasoning about semantic changes in bi-temporal remote sensing images. A straightforward approach is to boost CDVQA performance with generic vision-language models via supervised fine-tuning (SFT). Despite recent progress, we observe that a significant portion of failures do not stem from clearly incorrect predictions, but from decision ambiguity, where the model assigns similar confidence to the correct answer and strong distractors. To formalize this challenge, we define Decision-Ambiguous Samples (DAS) as instances with a small probability margin between the ground-truth answer and the most competitive alternative. We argue that explicitly optimizing DAS is crucial for improving the discriminability and robustness of CDVQA models. To this end, we propose DARFT, a Decision-Ambiguity-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques