Multilingual Topic Models
Kriste Krstovski, Michael J. Kurtz, David A. Smith, Alberto, Accomazzi

TL;DR
This paper introduces a multilingual topic model that represents various document formats as translations from a shared latent space, enabling better comparison and evaluation of different representations for scientific articles.
Contribution
It proposes a novel multilingual topic modeling approach that unifies diverse article representations into a common latent space, improving similarity assessment and vocabulary refinement.
Findings
Shared latent space effectively compares different article representations.
Method enables evaluation of representation quality based on topical similarity.
Approach helps identify vocabulary gaps in concept vocabularies.
Abstract
Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Topic Modeling
