Improving unsupervised neural aspect extraction for online discussions   using out-of-domain classification

Anton Alekseev; Elena Tutubalina; Valentin Malykh; Sergey Nikolenko

arXiv:2006.09766·cs.CL·June 18, 2020

Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Anton Alekseev, Elena Tutubalina, Valentin Malykh, Sergey Nikolenko

PDF

TL;DR

This paper proposes a sentence filtering method using out-of-domain classification to enhance the coherence of unsupervised neural aspect extraction in online discussions, especially for traditional text sources like news articles.

Contribution

It introduces a simple sentence filtering technique based on out-of-domain classification to improve neural aspect extraction without altering the core model architecture.

Findings

01

Sentence filtering improves topic coherence in newsgroup data

02

Filtering out low-probability in-domain sentences enhances aspect extraction quality

03

The method is effective across different traditional text sources

Abstract

Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.