# Exploring the Daschle Collection using Text Mining

**Authors:** Damon Bayer, Semhar Michael

arXiv: 1904.12623 · 2019-04-30

## TL;DR

This paper demonstrates how natural language processing and topic modeling can efficiently analyze large historical document collections, revealing key themes and events with minimal manual effort.

## Contribution

It applies LDA-based text mining to a political archive, showcasing a scalable method for summarizing extensive textual data in historical and political research.

## Key findings

- Identified major topics related to Senator Daschle's career.
- Detected significant events and issues through topic shifts.
- Showed the effectiveness of NLP methods in large-scale document analysis.

## Abstract

A U.S. Senator from South Dakota donated documents that were accumulated during his service as a house representative and senator to be housed at the Bridges library at South Dakota State University. This project investigated the utility of quantitative statistical methods to explore some portions of this vast document collection. The available scanned documents and emails from constituents are analyzed using natural language processing methods including the Latent Dirichlet Allocation (LDA) model. This model identified major topics being discussed in a given collection of documents. Important events and popular issues from the Senator Daschles career are reflected in the changing topics from the model. These quantitative statistical methods provide a summary of the massive amount of text without requiring significant human effort or time and can be applied to similar collections.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.12623/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1904.12623/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1904.12623/full.md

---
Source: https://tomesphere.com/paper/1904.12623