OvarNet: Towards Open-vocabulary Object Attribute Recognition

Keyan Chen; Xiaolong Jiang; Yao Hu; Xu Tang; Yan Gao; Jianqi Chen,; Weidi Xie

arXiv:2301.09506·cs.CV·January 24, 2023·5 cites

OvarNet: Towards Open-vocabulary Object Attribute Recognition

Keyan Chen, Xiaolong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen,, Weidi Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces OvarNet, a comprehensive approach for open-vocabulary object detection and attribute recognition, leveraging federated training, weak supervision, and knowledge distillation to improve generalization and efficiency.

Contribution

It proposes a multi-stage framework combining dataset fusion, weakly supervised learning, and knowledge distillation for open-vocabulary object and attribute detection.

Findings

01

Joint training improves scene understanding accuracy.

02

Model generalizes well to unseen attributes and categories.

03

End-to-end training outperforms naive two-stage methods.

Abstract

In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. To achieve this goal, we make the following contributions: (i) we start with a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr. The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes; (ii) we combine all available datasets and train with a federated strategy to finetune the CLIP model, aligning the visual representation with attributes, additionally, we investigate the efficacy of leveraging freely available online image-caption pairs under weakly supervised learning; (iii) in pursuit of efficiency, we train a Faster-RCNN type…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KyanChen/OvarNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training · Region Proposal Network