Integrating topic modeling and word embedding to characterize violent   deaths

Alina Arseniev-Koehler; Susan D. Cochran; Vickie M. Mays; Kai-Wei; Chang; Jacob Gates Foster

arXiv:2106.14365·cs.CL·October 6, 2022

Integrating topic modeling and word embedding to characterize violent deaths

Alina Arseniev-Koehler, Susan D. Cochran, Vickie M. Mays, Kai-Wei, Chang, Jacob Gates Foster

PDF

1 Repo

TL;DR

This paper introduces a novel method combining topic modeling and word embeddings to identify and interpret latent themes in text data, exemplified by analyzing violent death reports with insights into gender biases.

Contribution

The paper presents Discourse Atom Topic Modeling, a new approach that integrates embeddings and topic modeling to uncover interpretable latent topics in unstructured text.

Findings

01

Identified 225 latent topics in violent death narratives.

02

Revealed gender biases in topics related to violence.

03

Provided detailed analysis of reporting patterns and gendered language.

Abstract

There is an escalating need for methods to identify latent patterns in text data from many domains. We introduce a new method to identify topics in a corpus and represent documents as topic sequences. Discourse Atom Topic Modeling draws on advances in theoretical machine learning to integrate topic modeling and word embedding, capitalizing on the distinct capabilities of each. We first identify a set of vectors ("discourse atoms") that provide a sparse representation of an embedding space. Atom vectors can be interpreted as latent topics: Through a generative model, atoms map onto distributions over words; one can also infer the topic that generated a sequence of words. We illustrate our method with a prominent example of underutilized text: the U.S. National Violent Death Reporting System (NVDRS). The NVDRS summarizes violent death incidents with structured variables and unstructured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arsena-k/discourse_atoms
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.