Mid-level Representation for Visual Recognition
Moin Nabi

TL;DR
This paper explores the use of mid-level visual representations, such as parts and attributes, to improve high-level visual recognition tasks like object detection and understanding in images and videos.
Contribution
It introduces a subcategory-aware, webly-supervised approach for discovering discriminative mid-level patches to enhance object recognition and address dataset bias.
Findings
Discovered effective mid-level patches for object recognition.
Improved recognition accuracy using subcategory-based models.
Addressed dataset bias through subcategory-aware modeling.
Abstract
Visual Recognition is one of the fundamental challenges in AI, where the goal is to understand the semantics of visual data. Employing mid-level representation, in particular, shifted the paradigm in visual recognition. The mid-level image/video representation involves discovering and training a set of mid-level visual patterns (e.g., parts and attributes) and represent a given image/video utilizing them. The mid-level patterns can be extracted from images and videos using the motion and appearance information of visual phenomenas. This thesis targets employing mid-level representations for different high-level visual recognition tasks, namely (i)image understanding and (ii)video understanding. In the case of image understanding, we focus on object detection/recognition task. We investigate on discovering and learning a set of mid-level patches to be used for representing the images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications
