ART: Adaptive Relation Tuning for Generalized Relation Prediction

Gopika Sudhakaran; Hikaru Shindo; Patrick Schramowski; Simone Schaub-Meyer; Kristian Kersting; Stefan Roth

arXiv:2507.23543·cs.CV·August 11, 2025

ART: Adaptive Relation Tuning for Generalized Relation Prediction

Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski, Simone Schaub-Meyer, Kristian Kersting, Stefan Roth

PDF

Open Access

TL;DR

ART introduces an instruction tuning framework with adaptive sampling to enhance visual relation detection, enabling models to generalize to unseen relations and improve scene understanding.

Contribution

The paper presents ART, a novel adaptive relation tuning method that leverages instruction tuning and strategic sampling to improve VRD generalization and unseen relation inference.

Findings

01

Significant improvement over baselines in relation prediction accuracy.

02

Ability to infer unseen relation concepts.

03

Enhanced scene segmentation using predicted relations.

Abstract

Visual relation detection (VRD) is the task of identifying the relationships between objects in a scene. VRD models trained solely on relation detection data struggle to generalize beyond the relations on which they are trained. While prompt tuning has been used to adapt vision-language models (VLMs) for VRD, it uses handcrafted prompts and struggles with novel or complex relations. We argue that instruction tuning offers a more effective solution by fine-tuning VLMs on diverse instructional data. We thus introduce ART, an Adaptive Relation Tuning framework that adapts VLMs for VRD through instruction tuning and strategic instance selection. By converting VRD datasets into an instruction tuning format and employing an adaptive sampling algorithm, ART directs the VLM to focus on informative relations while maintaining generalizability. Specifically, we focus on the relation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies