ACTRESS: Active Retraining for Semi-supervised Visual Grounding

Weitai Kang; Mengxue Qu; Yunchao Wei; Yan Yan

arXiv:2407.03251·cs.CV·July 9, 2024·1 cites

ACTRESS: Active Retraining for Semi-supervised Visual Grounding

Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan

PDF

Open Access 1 Models

TL;DR

This paper introduces ACTRESS, a novel active retraining framework for semi-supervised visual grounding that improves model performance by selective pseudo-labeling and periodic retraining, addressing limitations of previous methods.

Contribution

The paper proposes ACTRESS, a new framework that incorporates detection confidence, active sampling, and selective retraining to enhance semi-supervised visual grounding models.

Findings

01

Superior performance on benchmark datasets

02

Effective pseudo-label selection via Faithfulness, Robustness, and Confidence

03

Enhanced model robustness through periodic retraining

Abstract

Semi-Supervised Visual Grounding (SSVG) is a new challenge for its sparse labeled data with the need for multimodel understanding. A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. However, this approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. These pipelines directly regress results without region proposals or foreground binary classification, rendering them unsuitable for fitting in RefTeacher due to the absence of confidence scores. Furthermore, the geometric difference in teacher and student inputs, stemming from different data augmentations, induces natural misalignment in attention-based constraints. To establish a compatible SSVG framework, our paper proposes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition