A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

TL;DR
This paper introduces a novel generative framework for large-scale visual entity recognition that outperforms existing methods on Wikipedia-scale datasets by directly generating entity identifiers from images.
Contribution
The paper proposes the Generative Entity Recognition (GER) framework, a new approach that directly generates entity identifiers, achieving state-of-the-art results on web-scale visual recognition tasks.
Findings
GER surpasses existing baselines on the OVEN benchmark.
GER demonstrates superior accuracy in web-scale visual entity recognition.
The generative approach effectively handles the complexity of large-scale entity mapping.
Abstract
In this paper, we address web-scale visual entity recognition, specifically the task of mapping a given query image to one of the 6 million existing entities in Wikipedia. One way of approaching a problem of such scale is using dual-encoder models (eg CLIP), where all the entity names and query images are embedded into a unified space, paving the way for an approximate k-NN search. Alternatively, it is also possible to re-purpose a captioning model to directly generate the entity names for a given image. In contrast, we introduce a novel Generative Entity Recognition (GER) framework, which given an input image learns to auto-regressively decode a semantic and discriminative ``code'' identifying the target entity. Our experiments demonstrate the efficacy of this GER paradigm, showcasing state-of-the-art performance on the challenging OVEN benchmark. GER surpasses strong captioning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques
MethodsSolana Customer Service Number +1-833-534-1729 · Graph Convolutional Network · k-Nearest Neighbors · Gait Emotion Recognition
