DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer

Soichiro Okazaki; Tatsuya Sasaki; Hiroki Ohashi

arXiv:2605.10190·cs.CV·May 12, 2026

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer

Soichiro Okazaki, Tatsuya Sasaki, Hiroki Ohashi

PDF

1 Repo

TL;DR

DetRefiner is a plug-and-play framework that fuses global and local features via a lightweight Transformer to improve open-vocabulary object detection, enhancing performance without retraining base models.

Contribution

It introduces a model-agnostic, training-free method to refine detection confidence by integrating global and local contextual cues using feature fusion transformers.

Findings

01

Achieves up to +10.1 AP improvement on novel categories.

02

Enhances multiple OVOD models across datasets like COCO and LVIS.

03

Operates solely on base detector predictions without retraining.

Abstract

Open-vocabulary object detection (OVOD) aims to detect both seen and unseen categories, yet existing methods often struggle to generalize to novel objects due to limited integration of global and local contextual cues. We propose DetRefiner, a simple yet effective plug-and-play framework that learns to fuse global and local features to refine open-vocabulary detection. DetRefiner processes global image features and patch-level image features from foundational models (e.g., DINOv3) through a lightweight Transformer encoder. The encoder produces a class vector capturing image-level attributes and patch vectors representing local region attributes, from which attribute reliability is inferred to recalibrate the base model's confidence. Notably, DetRefiner is trained independently of the base OVOD model, requiring neither access to its internal features nor retraining. At inference, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitachi-rd-cv/detrefiner
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.