UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition
Xinyu Nan, Lingtao Mao, Huangyu Dai, Zexin Zheng, Xinyu Sun, Zihan Liang, Ben Chen, Yuqing Ding, Chenyi Lei, Wenwu Ou, Han Li

TL;DR
UniDGF introduces a detection-guided generative framework that enhances hierarchical object recognition by predicting semantic tokens, effectively capturing fine-grained categories and attributes in large-scale e-commerce scenarios.
Contribution
The paper proposes a novel unified detection-to-generation framework that improves fine-grained hierarchical recognition by integrating detection with generative semantic token prediction.
Findings
Outperforms existing similarity-based methods on large-scale datasets
Achieves stronger fine-grained recognition accuracy
Provides more coherent unified inference results
Abstract
Achieving visual semantic understanding requires a unified framework that simultaneously handles object detection, category prediction, and attribute recognition. However, current advanced approaches rely on global similarity and struggle to capture fine-grained category distinctions and category-specific attribute diversity, especially in large-scale e-commerce scenarios. To overcome these challenges, we introduce a detection-guided generative framework that predicts hierarchical category and attribute tokens. For each detected object, we extract refined ROI-level features and employ a BART-based generator to produce semantic tokens in a coarse-to-fine sequence covering category hierarchies and property-value pairs, with support for property-conditioned attribute recognition. Experiments on both large-scale proprietary e-commerce datasets and open-source datasets demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
