MapReduce for Counting Word Frequencies with MPI and GPUs
Nithin Kavi

TL;DR
This paper develops and compares CPU and GPU implementations of a MapReduce algorithm in Julia for counting word frequencies in large document sets, aiming for faster performance and pattern analysis across presidential speeches.
Contribution
It introduces a custom GPU-based MapReduce implementation in Julia that outperforms existing solutions like FoldsCUDA for specific tasks.
Findings
GPU implementation outperforms FoldsCUDA in simple mapping cases
Distinctive word usage patterns identified for each President
Proposed optimizations for word frequency counting algorithms
Abstract
In this project, the goal was to use the Julia programming language and parallelization to write a fast map reduce algorithm to count word frequencies across large numbers of documents. We first implement the word frequency counter algorithm on a CPU using two processes with MPI. Then, we create another implementation, but on a GPU using the Julia CUDA library, though not using the in built map reduce algorithm within FoldsCUDA.jl. After doing this, we apply our CPU and GPU algorithms to count the frequencies of words in speeches given by Presidents George W Bush, Barack H Obama, Donald J Trump, and Joseph R Biden with the aim of finding patterns in word choice that could be used to uniquely identify each President. We find that each President does have certain words that they use distinctly more often than their fellow Presidents, and these words are not surprising given the political…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression
