Structured Prediction with Output Embeddings for Semantic Image Annotation
Ariadna Quattoni, Arnau Ramisa, Pranava Swaroop Madhyastha, Edgar, Simo-Serra, Francesc Moreno-Noguer

TL;DR
This paper introduces a structured prediction model for semantic image annotation that leverages output embeddings and bilinear scoring functions to effectively handle large class sets and data sparsity.
Contribution
It proposes a novel factorized log-linear model incorporating output feature representations, improving annotation accuracy in complex semantic tuple prediction tasks.
Findings
Output embeddings enhance prediction performance.
Output representation is argument-specific.
Model effectively handles large class sets with data sparsity.
Abstract
We address the task of annotating images with semantic tuples. Solving this problem requires an algorithm which is able to deal with hundreds of classes for each argument of the tuple. In such contexts, data sparsity becomes a key challenge, as there will be a large number of classes for which only a few examples are available. We propose handling this by incorporating feature representations of both the inputs (images) and outputs (argument classes) into a factorized log-linear model, and exploiting the flexibility of scoring functions based on bilinear forms. Experiments show that integrating feature representations of the outputs in the structured prediction model leads to better overall predictions. We also conclude that the best output representation is specific for each type of argument.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
