MAGNeto: An Efficient Deep Learning Method for the Extractive Tags Summarization Problem
Hieu Trong Phung (1, 2), Anh Tuan Vu (1), Tung Dinh Nguyen (1), Lam, Thanh Do (1, 2), Giang Nam Ngo (1), Trung Thanh Tran (1), Ngoc C. L\^e, (1, 2) ((1) PIXTA Vietnam, Hanoi, Vietnam. (2) Hanoi University of Science, and Technology, Ha Noi, Viet Nam.)

TL;DR
MAGNeto introduces a deep learning approach for extractive image tag summarization that combines visual and textual data, auxiliary losses, gating, data augmentation, and unsupervised pre-training to improve performance on benchmark datasets.
Contribution
The paper presents a novel unified architecture integrating multiple deep learning components and strategies for extractive tags summarization, including a new loss function and pre-training method.
Findings
Achieves 90% F1 score on NUS-WIDE benchmark.
Attains 50% F1 score on a large-scale private dataset.
Demonstrates effectiveness of unsupervised pre-training and data augmentation.
Abstract
In this work, we study a new image annotation task named Extractive Tags Summarization (ETS). The goal is to extract important tags from the context lying in an image and its corresponding tags. We adjust some state-of-the-art deep learning models to utilize both visual and textual information. Our proposed solution consists of different widely used blocks like convolutional and self-attention layers, together with a novel idea of combining auxiliary loss functions and the gating mechanism to glue and elevate these fundamental components and form a unified architecture. Besides, we introduce a loss function that aims to reduce the imbalance of the training data and a simple but effective data augmentation technique dedicated to alleviates the effect of outliers on the final results. Last but not least, we explore an unsupervised pre-training strategy to further boost the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · SGD with Momentum · Residual Connection · Dropout · Dense Connections · Softmax · Multi-Head Attention · Layer Normalization
