DocTag2Vec: An Embedding Based Multi-label Learning Approach for   Document Tagging

Sheng Chen; Akshay Soni; Aasish Pappu; Yashar Mehdad

arXiv:1707.04596·cs.CL·July 18, 2017·6 cites

DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

Sheng Chen, Akshay Soni, Aasish Pappu, Yashar Mehdad

PDF

Open Access

TL;DR

DocTag2Vec is a novel embedding-based approach that jointly learns representations of words, documents, and tags to improve multi-label document tagging directly from raw text, outperforming existing methods.

Contribution

It extends Word2Vec and Doc2Vec models to jointly embed words, documents, and tags, enabling effective multi-label tagging and handling new tags without feature engineering.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Learns meaningful tag and document representations

03

Handles new tags dynamically

Abstract

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple $k$ -nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Natural Language Processing Techniques