Embedding Visual Hierarchy with Deep Networks for Large-Scale Visual Recognition
Tianyi Zhao, Baopeng Zhang, Wei Zhang, Ning Zhou, Jun Yu, Jianping Fan

TL;DR
This paper introduces a level-wise mixture model that embeds visual hierarchy within deep networks, enabling efficient large-scale visual recognition by jointly learning deep features, hierarchical classifiers, and hierarchy adaptation.
Contribution
It presents a novel end-to-end framework that combines deep networks with hierarchical classification and automatic hierarchy adaptation for large-scale recognition.
Findings
Achieves competitive accuracy on ImageNet datasets.
Supports joint learning for improved recognition performance.
Enhances efficiency in large-scale visual classification.
Abstract
In this paper, a level-wise mixture model (LMM) is developed by embedding visual hierarchy with deep networks to support large-scale visual recognition (i.e., recognizing thousands or even tens of thousands of object classes), and a Bayesian approach is used to adapt a pre-trained visual hierarchy automatically to the improvements of deep features (that are used for image and object class representation) when more representative deep networks are learned along the time. Our LMM model can provide an end-to-end approach for jointly learning: (a) the deep networks to extract more discriminative deep features for image and object class representation; (b) the tree classifier for recognizing large numbers of object classes hierarchically; and (c) the visual hierarchy adaptation for achieving more accurate indexing of large numbers of object classes hierarchically. By supporting joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
