Machine Translation for Accessible Multi-Language Text Analysis
Edward W. Chew, William D. Weisman, Jingying Huang, Seth Frey

TL;DR
This paper demonstrates that machine translation, specifically Google Translate, enables effective multi-language text analysis in computational social science, allowing accurate sentiment, topic, and word embedding analysis across 16 languages.
Contribution
The study shows that English-trained analytical measures can be reliably applied to multiple languages via translation, broadening access to computational linguistic tools.
Findings
Translation-based measures have adequate-to-excellent accuracy.
Google Translate effectively preserves semantic content.
Multi-language analysis is feasible with current translation tools.
Abstract
English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible to all computational scholars. We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy compared to source-language measures computed on original texts. We show this for three major analytics -- sentiment analysis, topic analysis, and word embeddings -- over 16 languages, including Spanish, Chinese, Hindi, and Arabic. We validate this claim by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling
