MAGNeto: An Efficient Deep Learning Method for the Extractive Tags   Summarization Problem

Hieu Trong Phung (1; 2); Anh Tuan Vu (1); Tung Dinh Nguyen (1); Lam; Thanh Do (1; 2); Giang Nam Ngo (1); Trung Thanh Tran (1); Ngoc C. L\^e; (1; 2) ((1) PIXTA Vietnam; Hanoi; Vietnam. (2) Hanoi University of Science; and Technology; Ha Noi; Viet Nam.)

arXiv:2011.04349·cs.CV·November 10, 2020

MAGNeto: An Efficient Deep Learning Method for the Extractive Tags Summarization Problem

Hieu Trong Phung (1, 2), Anh Tuan Vu (1), Tung Dinh Nguyen (1), Lam, Thanh Do (1, 2), Giang Nam Ngo (1), Trung Thanh Tran (1), Ngoc C. L\^e, (1, 2) ((1) PIXTA Vietnam, Hanoi, Vietnam. (2) Hanoi University of Science, and Technology, Ha Noi, Viet Nam.)

PDF

Open Access 1 Repo

TL;DR

MAGNeto introduces a deep learning approach for extractive image tag summarization that combines visual and textual data, auxiliary losses, gating, data augmentation, and unsupervised pre-training to improve performance on benchmark datasets.

Contribution

The paper presents a novel unified architecture integrating multiple deep learning components and strategies for extractive tags summarization, including a new loss function and pre-training method.

Findings

01

Achieves 90% F1 score on NUS-WIDE benchmark.

02

Attains 50% F1 score on a large-scale private dataset.

03

Demonstrates effectiveness of unsupervised pre-training and data augmentation.

Abstract

In this work, we study a new image annotation task named Extractive Tags Summarization (ETS). The goal is to extract important tags from the context lying in an image and its corresponding tags. We adjust some state-of-the-art deep learning models to utilize both visual and textual information. Our proposed solution consists of different widely used blocks like convolutional and self-attention layers, together with a novel idea of combining auxiliary loss functions and the gating mechanism to glue and elevate these fundamental components and form a unified architecture. Besides, we introduce a loss function that aims to reduce the imbalance of the training data and a simple but effective data augmentation technique dedicated to alleviates the effect of outliers on the final results. Last but not least, we explore an unsupervised pre-training strategy to further boost the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pixta-dev/labteam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · SGD with Momentum · Residual Connection · Dropout · Dense Connections · Softmax · Multi-Head Attention · Layer Normalization