Moving beyond word lists: towards abstractive topic labels for human-like topics of scientific documents
Domenic Rosati

TL;DR
This paper explores using abstractive multi-document summarization to generate more natural, human-like topic labels for scientific documents, aiming to improve interpretability over traditional word list labels.
Contribution
It introduces an approach to producing human-like topic labels via MDS and evaluates its potential through a case study on citation sentences, highlighting future research directions.
Findings
MDS can produce more human-like topic labels.
Evaluation benefits from clustering and summarization measures.
Further development is needed for practical application.
Abstract
Topic models represent groups of documents as a list of words (the topic labels). This work asks whether an alternative approach to topic labeling can be developed that is closer to a natural language description of a topic than a word list. To this end, we present an approach to generating human-like topic labels using abstractive multi-document summarization (MDS). We investigate our approach with an exploratory case study. We model topics in citation sentences in order to understand what further research needs to be done to fully operationalize MDS for topic labeling. Our case study shows that in addition to more human-like topics there are additional advantages to evaluation by using clustering and summarization measures instead of topic model measures. However, we find that there are several developments needed before we can design a well-powered study to evaluate MDS for topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Computational and Text Analysis Methods
