Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
Anton Alekseev, Elena Tutubalina, Valentin Malykh, Sergey Nikolenko

TL;DR
This paper proposes a sentence filtering method using out-of-domain classification to enhance the coherence of unsupervised neural aspect extraction in online discussions, especially for traditional text sources like news articles.
Contribution
It introduces a simple sentence filtering technique based on out-of-domain classification to improve neural aspect extraction without altering the core model architecture.
Findings
Sentence filtering improves topic coherence in newsgroup data
Filtering out low-probability in-domain sentences enhances aspect extraction quality
The method is effective across different traditional text sources
Abstract
Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
