Cross-Discourse and Multilingual Exploration of Textual Corpora with the DualNeighbors Algorithm
Taylor Arnold, Lauren Tilton

TL;DR
The paper introduces the DualNeighbors algorithm, a novel method for linking thematically similar documents across cultural and linguistic boundaries, enabling cross-cultural exploration of textual corpora.
Contribution
It presents a new algorithm that overcomes cultural and language barriers to reveal cross-cultural connections in textual datasets.
Findings
Effective in linking culturally and linguistically diverse documents
Validated through qualitative and quantitative evaluations
Open-source implementation available for researchers
Abstract
Word choice is dependent on the cultural context of writers and their subjects. Different words are used to describe similar actions, objects, and features based on factors such as class, race, gender, geography and political affinity. Exploratory techniques based on locating and counting words may, therefore, lead to conclusions that reinforce culturally inflected boundaries. We offer a new method, the DualNeighbors algorithm, for linking thematically similar documents both within and across discursive and linguistic barriers to reveal cross-cultural connections. Qualitative and quantitative evaluations of this technique are shown as applied to two cultural datasets of interest to researchers across the humanities and social sciences. An open-source implementation of the DualNeighbors algorithm is provided to assist in its application.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
