Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

Hongda Qin; Xiao Lu; Zhiyong Wei; Yihong Cao; Kailun Yang; Ningjiang Chen

arXiv:2505.07219·cs.CV·May 13, 2025

Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

Hongda Qin, Xiao Lu, Zhiyong Wei, Yihong Cao, Kailun Yang, Ningjiang Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel language-driven dual style mixing approach for single-domain object detection that enhances generalization to unseen domains by leveraging semantic information from vision-language models for both image and feature augmentation.

Contribution

The proposed LDDS method utilizes semantic prompts from VLMs to generate style-diversified images and feature augmentations, enabling model-agnostic domain generalization without specific augmentation choices.

Findings

01

Improves detection performance across various unseen domains.

02

Effective in tasks like real to cartoon and normal to adverse weather.

03

Compatible with multiple mainstream detector frameworks.

Abstract

Generalizing an object detector trained on a single domain to multiple unseen domains is a challenging task. Existing methods typically introduce image or feature augmentation to diversify the source domain to raise the robustness of the detector. Vision-Language Model (VLM)-based augmentation techniques have been proven to be effective, but they require that the detector's backbone has the same structure as the image encoder of VLM, limiting the detector framework selection. To address this problem, we propose Language-Driven Dual Style Mixing (LDDS) for single-domain generalization, which diversifies the source domain by fully utilizing the semantic information of the VLM. Specifically, we first construct prompts to transfer style semantics embedded in the VLM to an image translation network. This facilitates the generation of style diversified images with explicit semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qinhongda8/ldds
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications